CN114595923A - Group teaching recommendation system based on deep reinforcement learning - Google Patents

Group teaching recommendation system based on deep reinforcement learning Download PDF

Info

Publication number
CN114595923A
CN114595923A CN202210028554.7A CN202210028554A CN114595923A CN 114595923 A CN114595923 A CN 114595923A CN 202210028554 A CN202210028554 A CN 202210028554A CN 114595923 A CN114595923 A CN 114595923A
Authority
CN
China
Prior art keywords
student
model
recommendation
group
teaching
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210028554.7A
Other languages
Chinese (zh)
Other versions
CN114595923B (en
Inventor
杨腾杰
左琳
陈柯弟
刘念伯
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Electronic Science and Technology of China
Original Assignee
University of Electronic Science and Technology of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Electronic Science and Technology of China filed Critical University of Electronic Science and Technology of China
Priority to CN202210028554.7A priority Critical patent/CN114595923B/en
Publication of CN114595923A publication Critical patent/CN114595923A/en
Application granted granted Critical
Publication of CN114595923B publication Critical patent/CN114595923B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/067Enterprise or organisation modelling
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/20Education
    • G06Q50/205Education administration or guidance
    • GPHYSICS
    • G09EDUCATION; CRYPTOGRAPHY; DISPLAY; ADVERTISING; SEALS
    • G09BEDUCATIONAL OR DEMONSTRATION APPLIANCES; APPLIANCES FOR TEACHING, OR COMMUNICATING WITH, THE BLIND, DEAF OR MUTE; MODELS; PLANETARIA; GLOBES; MAPS; DIAGRAMS
    • G09B5/00Electrically-operated educational appliances
    • G09B5/08Electrically-operated educational appliances providing for individual presentation of information to a plurality of student stations
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Strategic Management (AREA)
  • General Physics & Mathematics (AREA)
  • Educational Administration (AREA)
  • Human Resources & Organizations (AREA)
  • Tourism & Hospitality (AREA)
  • Educational Technology (AREA)
  • Economics (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Evolutionary Computation (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Marketing (AREA)
  • General Business, Economics & Management (AREA)
  • Primary Health Care (AREA)
  • Development Economics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Game Theory and Decision Science (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Electrically Operated Instructional Devices (AREA)

Abstract

The invention discloses a group teaching recommendation system based on deep reinforcement learning, and belongs to the technical field of education and information. The invention collects student data through interactive methods such as voting, question answering, homework, quiz and the like in a classroom, provides a teaching plan with the maximum overall income for a given student group, and the overall income can be expressed by a multi-objective optimization function, and specifically can include but not limited to passing rate, excellent rate, average rate and the like. The invention uses a deep reinforcement learning method to carry out target-oriented teaching path planning for teachers, and can process large-scale complex data. Meanwhile, the training process which takes most of the time is put before and after class, and the teacher can immediately obtain recommended teaching knowledge points through the class feedback of students in the class.

Description

Group teaching recommendation system based on deep reinforcement learning
Technical Field
The invention relates to the technical field of education and information, in particular to a group teaching recommendation system based on deep reinforcement learning.
Background
In conventional classroom teaching, a teacher often arranges learning content according to experience because the learning details of students are invisible and uncontrolled. Of course, the teacher may have a variety of information, including questions and answers, classroom assessment, and the student's facial expressions, gestures, and body movements to assess the student's learning performance. However, this information is often crude and cannot cover every student or track every person's learning details, which makes teachers often unable to design teaching paths on a fine-grained basis. The development of the teaching auxiliary system relieves the difficulty faced by teachers. The teaching auxiliary system provides various teacher-student interaction methods, can record interaction information, and teachers can more accurately and deeply know student conditions through the interaction information. On the other hand, the teaching auxiliary system can also provide recommended teaching plans or learning plans for teachers or students, and work pressure of the teachers is relieved to a greater extent.
Patent application publication No. CN 112700688A discloses an intelligent classroom teaching assistance system. Student learning data are collected through interaction methods such as classroom voting, modeling tracking is carried out on students on the basis of the data, and finally a recommended teaching plan is given according to models of students in a whole class. However, the recommendation algorithm simulates the learning process of students in various teaching plans based on the current student model, and finally selects the teaching plan with the best simulation effect as the recommendation. In order to obtain better recommendation effect, the situation under all possible teaching plans needs to be simulated as much as possible, which brings great calculation amount and time consumption. With more students and more knowledge points, the resulting long wait may be unacceptable, resulting in the teacher not being able to get timely feedback in the classroom.
Disclosure of Invention
The invention provides a group teaching recommendation system based on deep reinforcement learning, which can be used for improving the processing efficiency of group teaching recommendation.
The technical scheme adopted by the invention is as follows:
a group education recommendation system based on deep reinforcement learning, the system comprising: the system comprises a user terminal, a knowledge point management module, a student data management module, a student model module, a pre-training module and a group teaching recommendation module;
the user terminal is used for teachers or students to log in the system and is an interactive input and output terminal of the user and the system;
the knowledge point management module is used for a teacher user to input knowledge point data and send the knowledge point data to the student model module and the pre-training module group teaching recommendation module;
the student data management module is used for inputting student basic data by a student user and sending the student basic data to the student model module; the student feedback acquisition module is used for acquiring student classroom feedback in a classroom and sending the classroom feedback to the group teaching recommendation module;
the student model module creates a student model based on currently input knowledge point data and student basic data according to a preset creating strategy and sends the student model to the pre-training module;
the pre-training module takes a student model created by the student model module as a learning main body, takes data sent by the knowledge point management module and the student data management module as training data, and trains a preset initial group recommendation model to obtain a trained group recommendation model; the initial group recommendation model comprises a first neural network model and a second neural network model, the first neural network model and the second neural network model comprise an input layer, at least one hidden layer and an output layer, wherein the input layer is a student classroom feedback data sequence, the hidden layer is a neural network capable of processing sequence input, and the output layer of the first neural network model is used for outputting recommendation degrees of all knowledge points of a current course; the output layer of the second neural network model is used for outputting an evaluation value of the current classroom teaching, namely an evaluation value of a currently executed teaching behavior; the group teaching recommendation module calls a group recommendation model trained by the pre-training module, and outputs teaching recommendation information in combination with student classroom feedback of each class of the curriculum in the course teaching process and sends the teaching recommendation information to corresponding teacher users; and storing student classroom feedback collected by the student data management module; updating and training the group recommendation model based on the classroom feedback of students stored in the current period according to the configured model updating period in the course teaching process;
the output teaching recommendation information comprises a recommendation knowledge point of the next class and an evaluation value of a feedback data sequence of the current student class, wherein the recommendation knowledge point of the next class is the knowledge point with the maximum recommendation degree;
further, the knowledge point data includes: knowledge point ID, belonging course name, knowledge point introduction, knowledge point content, knowledge point difficulty coefficient, preposed knowledge point ID of the knowledge point, classroom test question matched with the knowledge point, and knowledge point related data.
Further, the student basic data includes: student's school number, name, age, gender, age and student type; the student classroom feedback comprises the following data: the test subject name, the affiliated knowledge point ID, the test subject content, the ID of the participating test student, the test result of the student and the like.
Further, the student model module simulates a group recommendation model training process of real students in the pre-training module by using a student model, and a construction model of the student model is an Ebinghao memory model, a half-life memory model or a Bayesian knowledge tracking model;
and the description of the model includes:
describing the current grasp state of the virtual student for each knowledge point;
a process describing how a virtual student transitions from one state to another by learning;
classroom feedback after learning is described.
Further, the training of the initial group recommendation model by the pre-training module comprises:
the student model created by the student model module is used as a virtual student to form a class for training;
setting course requirement information and initializing network parameters of the initial group recommendation model;
the method comprises the steps of taking virtual students in a whole class as an environment, taking a first neural network model and a second neural network model of an initial group recommendation model as intelligent agents, training the intelligent agents by adopting a near-end strategy optimization algorithm, and storing current network parameters when preset training end conditions are met to obtain the trained group recommendation model.
The course requirement information includes: the number of lessons, the passing rate, the excellent rate, the average score and the like which need to be achieved when the lessons are finished.
Further, training the agent using a near-end strategy optimization algorithm includes:
step S1: recording the initial state of the virtual student;
step S2: judging whether the first cycle number reaches a preset first maximum cycle number, if so, executing a step S3; otherwise, the following processing is executed in a loop:
step S201: resetting the virtual student status to the initial status recorded in step S1;
step S202: circularly executing the step S202-1 to the step S202-4 until the circulation times reach the preset maximum sub-circulation times; the student classroom feedback of the virtual students in each cycle, the recommendation degree of the knowledge points output by the first neural network, the evaluation value output by the second neural network model and the reward value obtained by calculating the knowledge points learned last time according to the course requirement information through the classroom feedback of all the virtual students are recorded;
step S202-1: all the virtual students participate in classroom learning, and the virtual students give classroom feedback;
step S202-2: forming a student classroom feedback data sequence by the student classroom feedback given in the step S202-1, inputting the student classroom feedback data sequence into a first neural network, and obtaining a knowledge point of the next teaching based on the output of the student classroom feedback data sequence, namely taking the knowledge point with the maximum recommendation degree as the knowledge point of the next teaching;
step S202-3: forming a student classroom feedback data sequence by the student classroom feedback given in the step S202-1, inputting the student classroom feedback data sequence into a second neural network model, and obtaining an evaluation value of the student classroom feedback data sequence based on the output of the student classroom feedback data sequence;
step S202-4: all the virtual students learn the next teaching knowledge point obtained based on the first neural network;
step S3: judging whether a preset second maximum cycle number is reached, if so, ending; otherwise, the following processing is executed in a loop:
step S301: sampling the student classroom feedback data collected in the step S2;
step S302: calculating a first objective function (namely the output loss of the first neural network) based on the sampled data, and adjusting network parameters of the first neural network according to a preset random gradient rising algorithm;
step S303: calculating a second objective function (namely the output loss of the second neural network model) based on the sampled data, and adjusting the network parameters of the second neural network model according to a preset random gradient ascending algorithm;
the recommendation and training process of the group teaching recommendation module comprises the following steps:
initializing a group recommendation model, and initializing by using the network parameters stored after the training of a pre-training module is finished;
after the teacher starts teaching, forming a student classroom feedback data sequence based on student classroom feedback of students in each class through a user terminal, and respectively inputting a first neural network model and a second neural network model of a group recommendation model; obtaining recommended teaching knowledge points of the next classroom and corresponding evaluation values based on the output of the recommended teaching knowledge points, and storing the student classroom feedback data sequence, the recommended teaching knowledge points and the evaluation values; and sending the recommended teaching knowledge points of the next classroom to the corresponding teachers;
and after the class, updating and training the group recommendation model based on historical data stored in the current updating period, wherein the historical data comprises a plurality of groups of data records, and each group of data comprises a student classroom feedback data sequence, a recommended teaching knowledge point and an evaluation value.
The technical scheme provided by the embodiment of the invention at least has the following beneficial effects:
compared with the prior art, the student data are collected through interactive methods such as voting, question answering, homework and quizzes in a classroom, a teaching plan with the maximum overall income is provided for a given student group (such as the whole class), and the overall income can be represented by a multi-objective optimization function and specifically can include (but is not limited to) passing rate, excellence rate, average rate and the like. The invention uses a deep reinforcement learning method to carry out target-oriented teaching path planning for teachers, and can process large-scale complex data. Meanwhile, the training process which takes most of the time is put before and after class, and the teacher can immediately obtain recommended teaching knowledge points through the class feedback of students in the class.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
Fig. 1 is a block diagram of a group teaching recommendation system based on deep reinforcement learning according to an embodiment of the present invention;
FIG. 2 is a teaching process data sequence chart of a group teaching recommendation system based on deep reinforcement learning according to an embodiment of the present invention;
FIG. 3 is a flow chart of a pre-training module of a group teaching recommendation system based on deep reinforcement learning according to an embodiment of the present invention;
FIG. 4 is a flowchart of a group teaching recommendation module of a group teaching recommendation system based on deep reinforcement learning according to an embodiment of the present invention;
fig. 5 is a clip function diagram of a group education recommendation system based on deep reinforcement learning according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, embodiments of the present invention will be described in detail with reference to the accompanying drawings.
The embodiment of the invention provides a group teaching recommendation system based on deep reinforcement learning, as shown in fig. 1, the system comprises: the system comprises a user terminal (used for teachers or students to log in the system), a knowledge point management module, a student data management module, a student model module, a pre-training module and a group teaching recommendation module. The specific process of realizing group teaching recommendation through data interaction among the modules comprises the following steps:
(1) a user terminal (teacher user) inputs knowledge point data through a knowledge point management module, and the knowledge point management module sends the knowledge point data to a student model module and a pre-training module group teaching recommendation module;
(2) a user terminal (student user) inputs student basic data through a student data management module, and the student data management module sends the student basic data to a student model module and a pre-training module; in the classroom, through the interaction with the user terminal, the classroom feedback of students is collected and sent to the group teaching recommendation module;
(3) the student model module creates a student model based on the currently input related information (knowledge point data and student basic data) according to a preset creating strategy and sends the student model to the pre-training module;
(4) the pre-training module takes a student model created by the student model module as a learning main body, takes data sent by the knowledge point management module and the student data management module as training data, and trains a preset initial group recommendation model to obtain a trained group recommendation model;
(5) the group teaching recommendation module calls a group recommendation model trained by the pre-training module, and in the course teaching process, teaching recommendation information is output and sent to corresponding teacher users in combination with the classroom feedback of students of each class of the curriculum; and storing student classroom feedback collected by the student data management module; and updating and training the group recommendation model based on the classroom feedback of the students stored in the current period according to the configured model updating period in the course teaching process.
In this embodiment, the knowledge point management module is configured to: receiving and storing knowledge point data (namely knowledge point information) input by an expert; the received data is provided as a data set to other modules for use. The expert refers to a teacher or a group of teachers with profound teaching experience and familiar with the course knowledge points; the knowledge point information comprises knowledge point ID, belonging course name, knowledge point introduction, knowledge point content, knowledge point difficulty coefficient, preposed knowledge point ID of the knowledge point, classroom test question matched with the knowledge point and knowledge point related data.
In this embodiment, the student data management module is configured to: receiving and storing the student basic information input by the students; collecting and storing classroom feedback data of students in a classroom test interaction mode; the received data is provided as a data set to other modules for use. The basic information of the students comprises the study numbers, names, ages, sexes, ages and types of the students; the classroom feedback data comprises test subject names, affiliated knowledge points ID, test subject contents, ID of students participating in testing and test results of the students; the data set comprises a student basic information data set and a classroom feedback data set; the data sequence generated in the teaching process in this embodiment is shown in fig. 2.
In this embodiment, the student model module is configured to create a student model based on a student basic information dataset; and simulating a group recommendation model training process in a pre-training module by using the student model. The student model is realized through an Ebinghaos memory model, and is used for describing a plurality of characteristic information:
(1) describing the current grasping state of the virtual student for each knowledge point, the formula in this embodiment is as follows:
Figure BDA0003465431020000061
wherein ,PiRepresenting the probability of a student's mastery of the ith knowledge point,
Figure BDA0003465431020000062
representing the grasping probability of the leading knowledge point of the ith knowledge point,theta is a difficulty coefficient and is determined according to the specific conditions of students and knowledge points, D is the time from the previous learning of the knowledge point to the present learning, and S is the total times of learning the knowledge point;
(2) describing the process of how a virtual student transitions from one state to another by learning, in this embodiment by changing D and S in the above formula;
(3) describing the classroom feedback after learning, in this embodiment, a random number between 0 and 1 is sampled, if the random number is smaller than P in the above formulaiThen the question that answers the correct knowledge point is considered to be answered, and the other way round is not answered.
In this embodiment, the pre-training module is configured to train a group recommendation model through a near-end policy optimization algorithm based on a student model module before class, and the flow is shown in fig. 3. The group recommendation model consists of a recommendation neural network and a comment family neural network. The recommendation neural network is a recurrent neural network, because the feedback data of the students is a sequence of data arranged according to time, so the recommendation neural network needs to be capable of processing sequence input. The critic neural network structure is similar to the recommendation neural network, namely the critic neural network structure and the recommendation neural network both comprise an input layer, a hidden layer and an output layer, wherein the input layer is used for inputting a student classroom feedback sequence, the number of the hidden layers can be one or multiple, the number of the hidden layers of the critic neural network structure and the recommendation neural network structure can be consistent or different, the output layer is the main difference of the critic neural network structure and the recommendation neural network structure, the output layer of the recommendation neural network is used for classified output, the output layer of the critic neural network structure adopts a sofmax function, and output information is used for representing the recommendation degree of each knowledge point of the current course in the next classroom (when a recommendation result is formed, the maximum recommendation degree is taken as the recommendation result); the output layer of the critic neural network adopts a Linear function, and the output information is used for representing the rating value of the behavior at each sampling moment, namely the output of the critic neural network is the evaluation (rating value) of the current classroom teaching. The group recommendation model is trained by using a near-end Policy Optimization (PPO), and the specific training process is as follows:
(1) the student models created by the student model module are used as virtual students to form a class to participate in training, and if the number of people in the class is 20;
(2) setting course requirement information;
(3) initializing a group recommendation model, namely initializing network parameters of a recommendation neural network and a comment family neural network;
(4) taking virtual students in a whole class as an environment, recommending a neural network and a commenting family neural network as an agent, and training the agent by using a near-end strategy optimization algorithm;
(5) after training of the recommended neural network and the comment family neural network is completed, network parameters of the currently recommended neural network and the comment family neural network are stored and provided for the group teaching recommendation module to use.
As a possible implementation manner, in the training process, in (2), the course requirement information includes a number of courses of 80, a passing rate required to be achieved at the end of a course is 0.8, an excellent rate is 0.2, and a higher average is better; (3) the initialized group recommendation model in (1) comprises a neural network with two layers, and a hidden layer with 64 neurons; (4) the process of the near-end strategy optimization algorithm is as follows:
(1) recording the initial state of the virtual student;
(2) the following steps are circulated for the specified times:
(2-a) the following steps are cycled for a specified number of times:
I. resetting the virtual student state to the initial state saved in (1);
II. The following steps are circulated until the learning times reach the set class time, the test result returned by the students in each circulation is recorded, the knowledge points output by the neural network are recommended, the evaluation values output by the family neural network are commented, and the reward values obtained by calculating the knowledge points learned last time according to the course requirement information through the classroom feedback of all the students are calculated, wherein the reward value formula is as follows:
Reward=λ1Rp2Re3Ra
wherein ,RpFinger-to-pass rate, ReMeans excellent rate, RaMeans the evaluation grasp probability, lambda, of all students to the knowledge point1,λ2 and λ3The weights of the two are represented, and the values are greater than or equal to 0, and are empirical values, which is not limited in the present invention. In this example, 5, 3, 1 are taken.
1) All the virtual students participate in the classroom test, and the virtual students return test results;
2) transmitting the classroom feedback into a recommendation neural network, and outputting a recommended knowledge point of the next teaching;
3) transmitting the classroom feedback into a comment family neural network, and outputting an evaluation value;
4) all virtual students learn and recommend knowledge points output by the neural network;
(2-b) cycle the following work by the specified number of times:
I. sampling is performed from the data collected in (2-a).
II. Calculating an objective function by using the sampled data, and selecting a random gradient ascent algorithm to train a recommended neural network, wherein the formula is as follows:
Figure BDA0003465431020000071
wherein ,θkRefers to the parameters of the recommended neural network at the k-th training, DkRefers to a sampled data set, τ refers to a sampled data under a group of teaching paths, i.e. a complete teaching track sample (e.g. 40 courses in class, and after the complete teaching, there are 40 courses in teaching, and the teaching of the 40 courses forms a group of data, DkComposed of many groups of tau), T is the duration of the course, piθ(at|st) When the expression parameter is theta, the input classroom feedback is s at the time ttAn output of atThe clip function diagram is as shown in FIG. 4, i.e. the input parameters of clip () include rt(θ)
Figure BDA0003465431020000081
And represents the boundary value e, if rt(theta) is less than or equal to 1-epsilon, and clip (), then, 1-epsilon; if rtIf (θ) is equal to or greater than 1 +. epsilon, clip (), 1 +. epsilon, and if 1-. epsilon < rt(theta) < 1+ ∈, then clip () < rt(theta). In this embodiment, the value of the boundary value e is 0.1.
Figure BDA0003465431020000082
For the dominance value of the behavior at time t, the formula is as follows:
Figure BDA0003465431020000083
ξt=rt+γV(st+1)-V(st)
wherein ,ξtIntermediate parameter representing time t, i.e. intermediate parameter xi at different timestAccording to xitIs calculated by the formula (2), gamma represents the discount factor, the value in this embodiment is 0.99, T represents the total time up to now, rtIndicating the prize value, V(s), earned at time tt) The comment value given by the comment family neural network at the moment t is represented;
and III, calculating an objective function by using the sampled data, and selecting a random gradient ascent algorithm to train a critic neural network formula as follows:
Figure BDA0003465431020000084
wherein ,
Figure BDA0003465431020000085
refers to the parameters of the critic neural network during the k-th training,
Figure BDA0003465431020000086
representation based on current network parameters
Figure BDA0003465431020000087
The output of the family neural network (comment value) is reviewed at time t.
In the embodiment, the group teaching recommendation module is used for receiving and storing classroom feedback data of students in a classroom; giving a recommended knowledge point for teaching based on classroom feedback; the group recommendation model is further trained using the classroom feedback data after the classroom. The recommendation and training process is shown in fig. 5:
(1) initializing a group recommendation model, and initializing by using parameters stored after the training of a pre-training module is finished;
(2) the teacher starts teaching, the student gives classroom feedback and inputs a recommended neural network and a commenting family neural network, the recommended neural network outputs recommended teaching knowledge points, the commenting family neural network outputs evaluation values, and all the data are stored;
(3) and (5) circularly executing the step (2), and teaching by the teacher according to the recommended knowledge points. After a certain number of times, after class, calculating an objective function by using the data stored so far, and training the group recommendation model again;
(4) and (3) circulating until the curriculum is finished.
Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.
What has been described above are merely some embodiments of the present invention. It will be apparent to those skilled in the art that various changes and modifications can be made without departing from the inventive concept thereof, and these changes and modifications can be made without departing from the spirit and scope of the invention.

Claims (10)

1. A group teaching recommendation system based on deep reinforcement learning is characterized by comprising: the system comprises a user terminal, a knowledge point management module, a student data management module, a student model module, a pre-training module and a group teaching recommendation module;
the user terminal is used for teachers or students to log in the system and is an interactive input and output terminal of the user and the system;
the knowledge point management module is used for a teacher user to input knowledge point data and send the knowledge point data to the student model module and the pre-training module group teaching recommendation module;
the student data management module is used for inputting student basic data by a student user and sending the student basic data to the student model module; the student feedback acquisition module is used for acquiring student classroom feedback in a classroom and sending the classroom feedback to the group teaching recommendation module;
the student model module creates a student model based on currently input knowledge point data and student basic data according to a preset creating strategy and sends the student model to the pre-training module;
the pre-training module takes a student model created by the student model module as a learning main body, takes data sent by the knowledge point management module and the student data management module as training data, and trains a preset initial group recommendation model to obtain a trained group recommendation model; the initial group recommendation model comprises a first neural network model and a second neural network model, the first neural network model and the second neural network model comprise an input layer, at least one hidden layer and an output layer, wherein the input layer is a student classroom feedback data sequence, the hidden layer is a neural network capable of processing sequence input, and the output layer of the first neural network model is used for outputting recommendation degrees of all knowledge points of a current course; the output layer of the second neural network model is used for outputting the evaluation value of the current classroom teaching;
the group teaching recommendation module calls a group recommendation model trained by the pre-training module, and outputs teaching recommendation information in combination with student classroom feedback of each class of the curriculum in the course teaching process and sends the teaching recommendation information to corresponding teacher users; and storing student classroom feedback collected by the student data management module; updating and training the group recommendation model based on the classroom feedback of students stored in the current period according to the configured model updating period in the course teaching process;
the output teaching recommendation information comprises a recommendation knowledge point of the next class and an evaluation value of a feedback data sequence of the current student class, wherein the recommendation knowledge point of the next class is the knowledge point with the maximum recommendation degree.
2. The group education recommendation system of claim 1, wherein the knowledge point data includes: knowledge point ID, belonging course name, knowledge point introduction, knowledge point content, knowledge point difficulty coefficient, preposed knowledge point ID of the knowledge point, classroom test question matched with the knowledge point, and knowledge point related data.
3. The group instruction recommendation system of claim 1 wherein the student base data comprises: student's school number, name, age, gender, age and student type; the student classroom feedback comprises the following data: the test subject name, the affiliated knowledge point ID, the test subject content, the ID of the participating test student, the test result of the student and the like.
4. The group education recommendation system of claim 1 wherein the student model module simulates a group recommendation model training process in a real student participation pre-training module using a student model whose construction model is an Ebingo memory model, a half-life memory model or a Bayesian knowledge tracking model;
and the description of the model includes:
describing the current grasp state of the virtual student for each knowledge point;
a process describing how a virtual student transitions from one state to another by learning;
classroom feedback after learning is described.
5. The group instruction recommendation system of claim 1, wherein the training of the initial group recommendation model by the pre-training module comprises:
the student model created by the student model module is used as a virtual student to form a class for training;
setting course requirement information and initializing network parameters of the initial group recommendation model;
the method comprises the steps of taking virtual students in a whole class as an environment, taking a first neural network model and a second neural network model of an initial group recommendation model as intelligent agents, training the intelligent agents by adopting a near-end strategy optimization algorithm, and storing current network parameters when preset training end conditions are met to obtain the trained group recommendation model.
6. The group education recommendation system of claim 5 wherein the course requirement information includes: the number of courses, and the passing rate, excellence rate and average score to be achieved at the end of the course.
7. The group instruction recommendation system of claim 1, wherein training the agent using the near-end strategy optimization algorithm comprises:
step S1: recording the initial state of the virtual student;
step S2: judging whether the first cycle number reaches a preset first maximum cycle number, if so, executing step S3; otherwise, the following processing is executed in a loop:
step S201: resetting the virtual student status to the initial status recorded in step S1;
step S202: circularly executing the step S202-1 to the step S202-4 until the circulation times reach the preset maximum sub-circulation times; the student classroom feedback of the virtual students in each cycle, the recommendation degree of the knowledge points output by the first neural network model, the evaluation value output by the second neural network model and the reward value obtained by calculating the knowledge points learned last time according to the course requirement information through the classroom feedback of all the virtual students are recorded;
step S202-1: all the virtual students participate in classroom learning, and the virtual students give classroom feedback;
step S202-2: forming a student classroom feedback data sequence by the student classroom feedback given in the step S202-1, inputting the student classroom feedback data sequence into a first neural network model, and obtaining a knowledge point of the next teaching based on the output of the student classroom feedback data sequence, namely taking the knowledge point with the maximum recommendation degree as the knowledge point of the next teaching;
step S202-3: forming a student classroom feedback data sequence by the student classroom feedback given in the step S202-1, inputting the student classroom feedback data sequence into a second neural network model, and obtaining an evaluation value of the student classroom feedback data sequence based on the output of the student classroom feedback data sequence;
step S202-4: all the virtual students learn the next teaching knowledge point obtained based on the first neural network model;
step S3: judging whether a preset second maximum cycle number is reached, if so, ending; otherwise, the following processing is executed in a loop:
step S301: sampling the student classroom feedback data collected in the step S2;
step S302: calculating a first objective function based on the sampled data, and adjusting network parameters of a first neural network model according to a preset random gradient ascent algorithm, wherein the first objective function is used for representing the output loss of the first neural network model;
step S303: and calculating a second objective function based on the sampled data, and adjusting network parameters of the second neural network model according to a preset random gradient rise algorithm, wherein the second objective function is used for representing the output loss of the second neural network model.
8. The group instruction recommendation system of claim 7 wherein the first objective function is:
Figure FDA0003465431010000031
wherein ,
θk+1representing network parameters of the first neural network during the (k + 1) th training;
Dkindicating miningA data set of a sample;
t represents the course duration;
tau represents sampling data under a group of teaching paths;
πθ(at|st) When the network parameter is theta, the input student classroom feedback is s at the time ttThe output is atThe probability of (d);
the input parameters of the function clip () include rt(θ) and represents the boundary value ∈ if rt(theta) is less than or equal to 1-epsilon, and clip (), then, 1-epsilon; if rtIf (θ) is equal to or greater than 1 +. epsilon, clip (), 1 +. epsilon, and if 1-. epsilon < rt(theta) < 1+ ∈, then clip () < rt(θ); wherein,
Figure FDA0003465431010000032
Figure FDA0003465431010000033
and expressing the dominant value of the behavior at the moment t, and the calculation formula is as follows:
Figure FDA0003465431010000034
ξt=rt+γV(st+1)-V(st);
wherein ,ξtAn intermediate parameter representing the time t, gamma a preset discount factor, rtIndicating the prize value, V(s), earned at time tt) Representing a criticality value of the output of the second neural network model at time t;
the second objective functions are respectively:
Figure FDA0003465431010000035
wherein ,
Figure FDA0003465431010000041
representing network parameters of the second neural network model in the k +1 training time;
Figure FDA0003465431010000042
representation based on current network parameters
Figure FDA0003465431010000043
The second neural network model outputs the criticality value at time t.
9. The group education recommendation system of claim 1 wherein the recommendation and training process of the group education recommendation module is:
initializing a group recommendation model, and initializing by using the network parameters stored after the training of a pre-training module is finished;
after the teacher starts teaching, forming a student classroom feedback data sequence based on student classroom feedback of students in each class through a user terminal, and respectively inputting a first neural network model and a second neural network model of a group recommendation model; obtaining recommended teaching knowledge points of the next classroom and corresponding evaluation values based on the output of the recommended teaching knowledge points, and storing the student classroom feedback data sequence, the recommended teaching knowledge points and the evaluation values; and sending the recommended teaching knowledge points of the next classroom to the corresponding teachers;
and after the class, updating and training the group recommendation model based on historical data stored in the current updating period, wherein the historical data comprises a plurality of groups of data records, and each group of data comprises a student classroom feedback data sequence, a recommended teaching knowledge point and an evaluation value.
10. The group instruction recommendation system of claim 1 wherein the first and second neural network model hidden layers are long-short term memory recurrent neural networks.
CN202210028554.7A 2022-01-11 2022-01-11 Group teaching recommendation system based on deep reinforcement learning Active CN114595923B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210028554.7A CN114595923B (en) 2022-01-11 2022-01-11 Group teaching recommendation system based on deep reinforcement learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210028554.7A CN114595923B (en) 2022-01-11 2022-01-11 Group teaching recommendation system based on deep reinforcement learning

Publications (2)

Publication Number Publication Date
CN114595923A true CN114595923A (en) 2022-06-07
CN114595923B CN114595923B (en) 2023-04-28

Family

ID=81803873

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210028554.7A Active CN114595923B (en) 2022-01-11 2022-01-11 Group teaching recommendation system based on deep reinforcement learning

Country Status (1)

Country Link
CN (1) CN114595923B (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116521936A (en) * 2023-06-30 2023-08-01 云南师范大学 Course recommendation method and device based on user behavior analysis and storage medium
CN117114937A (en) * 2023-09-07 2023-11-24 深圳市真实智元科技有限公司 Method and device for generating exercise song based on artificial intelligence
CN117455389A (en) * 2023-10-10 2024-01-26 北京华普亿方科技集团股份有限公司 Vocational training management platform based on artificial intelligence
CN117688248A (en) * 2024-02-01 2024-03-12 安徽教育网络出版有限公司 Online course recommendation method and system based on convolutional neural network
CN117910481A (en) * 2024-03-20 2024-04-19 北京语言大学 Spoken language dialogue method and device for assisting language learning and dialogue robot

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108614865A (en) * 2018-04-08 2018-10-02 暨南大学 Method is recommended in individualized learning based on deeply study
CN108615423A (en) * 2018-06-21 2018-10-02 中山大学新华学院 Instructional management system (IMS) on a kind of line based on deep learning
CN109242207A (en) * 2018-10-10 2019-01-18 中山大学 A kind of Financial Time Series prediction technique based on deeply study
EP3543918A1 (en) * 2018-03-20 2019-09-25 Flink AI GmbH Reinforcement learning method
US20210027178A1 (en) * 2019-07-26 2021-01-28 Ricoh Company, Ltd. Recommendation method and recommendation apparatus based on deep reinforcement learning, and non-transitory computer-readable recording medium
CN112700688A (en) * 2020-12-25 2021-04-23 电子科技大学 Intelligent classroom teaching auxiliary system
CN112784154A (en) * 2020-12-31 2021-05-11 电子科技大学 Online teaching recommendation system with data enhancement
KR20210124111A (en) * 2021-03-25 2021-10-14 베이징 바이두 넷컴 사이언스 테크놀로지 컴퍼니 리미티드 Method and apparatus for training model, device, medium and program product
CN113509726A (en) * 2021-04-16 2021-10-19 超参数科技(深圳)有限公司 Interactive model training method and device, computer equipment and storage medium
CN113590929A (en) * 2021-01-28 2021-11-02 腾讯科技(深圳)有限公司 Information recommendation method and device based on artificial intelligence and electronic equipment

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3543918A1 (en) * 2018-03-20 2019-09-25 Flink AI GmbH Reinforcement learning method
CN108614865A (en) * 2018-04-08 2018-10-02 暨南大学 Method is recommended in individualized learning based on deeply study
CN108615423A (en) * 2018-06-21 2018-10-02 中山大学新华学院 Instructional management system (IMS) on a kind of line based on deep learning
CN109242207A (en) * 2018-10-10 2019-01-18 中山大学 A kind of Financial Time Series prediction technique based on deeply study
US20210027178A1 (en) * 2019-07-26 2021-01-28 Ricoh Company, Ltd. Recommendation method and recommendation apparatus based on deep reinforcement learning, and non-transitory computer-readable recording medium
CN112700688A (en) * 2020-12-25 2021-04-23 电子科技大学 Intelligent classroom teaching auxiliary system
CN112784154A (en) * 2020-12-31 2021-05-11 电子科技大学 Online teaching recommendation system with data enhancement
CN113590929A (en) * 2021-01-28 2021-11-02 腾讯科技(深圳)有限公司 Information recommendation method and device based on artificial intelligence and electronic equipment
KR20210124111A (en) * 2021-03-25 2021-10-14 베이징 바이두 넷컴 사이언스 테크놀로지 컴퍼니 리미티드 Method and apparatus for training model, device, medium and program product
CN113509726A (en) * 2021-04-16 2021-10-19 超参数科技(深圳)有限公司 Interactive model training method and device, computer equipment and storage medium

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
JULIAN IBARZ 等: "How to Train Your Robot with Deep Reinforcement Learning – Lessons We’ve Learned" *
ZHENYA HUANG 等: "Exploring Multi-Obje ctive Exercise Re commendations in Online Education Systems" *
杨腾杰: "一种基于知识点的智能教学辅助系统的设计和实现" *
王雯: "基于深度强化学习的引导式习题推荐模型研究" *

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116521936A (en) * 2023-06-30 2023-08-01 云南师范大学 Course recommendation method and device based on user behavior analysis and storage medium
CN116521936B (en) * 2023-06-30 2023-09-01 云南师范大学 Course recommendation method and device based on user behavior analysis and storage medium
CN117114937A (en) * 2023-09-07 2023-11-24 深圳市真实智元科技有限公司 Method and device for generating exercise song based on artificial intelligence
CN117455389A (en) * 2023-10-10 2024-01-26 北京华普亿方科技集团股份有限公司 Vocational training management platform based on artificial intelligence
CN117455389B (en) * 2023-10-10 2024-05-28 北京华普亿方科技集团股份有限公司 Vocational training management platform based on artificial intelligence
CN117688248A (en) * 2024-02-01 2024-03-12 安徽教育网络出版有限公司 Online course recommendation method and system based on convolutional neural network
CN117688248B (en) * 2024-02-01 2024-04-26 安徽教育网络出版有限公司 Online course recommendation method and system based on convolutional neural network
CN117910481A (en) * 2024-03-20 2024-04-19 北京语言大学 Spoken language dialogue method and device for assisting language learning and dialogue robot

Also Published As

Publication number Publication date
CN114595923B (en) 2023-04-28

Similar Documents

Publication Publication Date Title
CN114595923B (en) Group teaching recommendation system based on deep reinforcement learning
Ramírez-Noriega et al. Evaluation module based on Bayesian networks to Intelligent Tutoring Systems
Hinostroza et al. Pedagogy embedded in educational software design: report of a case study
Pawlak et al. Learning assistant approaches to teaching computational physics problems in a problem-based learning course
Noh et al. Intelligent tutoring system using rule-based and case-based: a comparison
Terras Transforming the teacher: Examining personal transformations of faculty redesigning courses from face-to-face to online
CN113361791A (en) Student score prediction method based on graph convolution
Chan et al. Applying the genetic encoded conceptual graph to grouping learning
Olmstead et al. Assessing the interactivity and prescriptiveness of faculty professional development workshops: The real-time professional development observation tool
Hofstein et al. Teaching and learning in the school chemistry laboratory
CN112951022A (en) Multimedia interactive education training system
Lederman et al. Systematic assessment of communication games and simulations: An applied framework
Zulueta et al. Scenario-based microlearning strategy for improved basic science process skills in self-directed learning
Elloumi et al. Exploring requirements and opportunities for social robots in primary mathematics education
CN115205072A (en) Cognitive diagnosis method for long-period evaluation
CN114997461A (en) Time-sensitive answer correctness prediction method combining learning and forgetting
Hernández et al. A probabilistic model of affective behavior for Intelligent Tutoring Systems
Hare et al. Evaluation of a Game-Based Personalized Learning System
Kamha et al. Implementation of a Curriculum to Enhance Learning Management Competency in Computational Thinking for the Lower Secondary Teachers
Zou et al. A novel learning early-warning model based on knowledge points and question types
Wiggins et al. Acquiring professional skills: Virtual facilitator as model for team communication
CN116777402B (en) Personalized role positioning method for group collaboration based on meta cognition
Dang Exploring a hybrid online and offline English teaching model based on model hierarchy analysis
Juniati et al. Examining Prospective Teachers' Belief and Pedagogical Content Knowledge towards Teaching Practice in Mathematics Class: A Case Study.
Kemouss et al. Towards the Process of Adapting the Concrete and the Abstract Through Learning Activities According to Kolb's Styles in Online Teaching

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant