CN114595923A

CN114595923A - Group teaching recommendation system based on deep reinforcement learning

Info

Publication number: CN114595923A
Application number: CN202210028554.7A
Authority: CN
Inventors: 杨腾杰; 左琳; 陈柯弟; 刘念伯
Original assignee: University of Electronic Science and Technology of China
Current assignee: University of Electronic Science and Technology of China
Priority date: 2022-01-11
Filing date: 2022-01-11
Publication date: 2022-06-07
Anticipated expiration: 2042-01-11
Also published as: CN114595923B

Abstract

The invention discloses a group teaching recommendation system based on deep reinforcement learning, and belongs to the technical field of education and information. The invention collects student data through interactive methods such as voting, question answering, homework, quiz and the like in a classroom, provides a teaching plan with the maximum overall income for a given student group, and the overall income can be expressed by a multi-objective optimization function, and specifically can include but not limited to passing rate, excellent rate, average rate and the like. The invention uses a deep reinforcement learning method to carry out target-oriented teaching path planning for teachers, and can process large-scale complex data. Meanwhile, the training process which takes most of the time is put before and after class, and the teacher can immediately obtain recommended teaching knowledge points through the class feedback of students in the class.

Description

Group teaching recommendation system based on deep reinforcement learning

Technical Field

The invention relates to the technical field of education and information, in particular to a group teaching recommendation system based on deep reinforcement learning.

Background

In conventional classroom teaching, a teacher often arranges learning content according to experience because the learning details of students are invisible and uncontrolled. Of course, the teacher may have a variety of information, including questions and answers, classroom assessment, and the student's facial expressions, gestures, and body movements to assess the student's learning performance. However, this information is often crude and cannot cover every student or track every person's learning details, which makes teachers often unable to design teaching paths on a fine-grained basis. The development of the teaching auxiliary system relieves the difficulty faced by teachers. The teaching auxiliary system provides various teacher-student interaction methods, can record interaction information, and teachers can more accurately and deeply know student conditions through the interaction information. On the other hand, the teaching auxiliary system can also provide recommended teaching plans or learning plans for teachers or students, and work pressure of the teachers is relieved to a greater extent.

Patent application publication No. CN 112700688A discloses an intelligent classroom teaching assistance system. Student learning data are collected through interaction methods such as classroom voting, modeling tracking is carried out on students on the basis of the data, and finally a recommended teaching plan is given according to models of students in a whole class. However, the recommendation algorithm simulates the learning process of students in various teaching plans based on the current student model, and finally selects the teaching plan with the best simulation effect as the recommendation. In order to obtain better recommendation effect, the situation under all possible teaching plans needs to be simulated as much as possible, which brings great calculation amount and time consumption. With more students and more knowledge points, the resulting long wait may be unacceptable, resulting in the teacher not being able to get timely feedback in the classroom.

Disclosure of Invention

The invention provides a group teaching recommendation system based on deep reinforcement learning, which can be used for improving the processing efficiency of group teaching recommendation.

The technical scheme adopted by the invention is as follows:

a group education recommendation system based on deep reinforcement learning, the system comprising: the system comprises a user terminal, a knowledge point management module, a student data management module, a student model module, a pre-training module and a group teaching recommendation module;

the user terminal is used for teachers or students to log in the system and is an interactive input and output terminal of the user and the system;

the knowledge point management module is used for a teacher user to input knowledge point data and send the knowledge point data to the student model module and the pre-training module group teaching recommendation module;

the student data management module is used for inputting student basic data by a student user and sending the student basic data to the student model module; the student feedback acquisition module is used for acquiring student classroom feedback in a classroom and sending the classroom feedback to the group teaching recommendation module;

the student model module creates a student model based on currently input knowledge point data and student basic data according to a preset creating strategy and sends the student model to the pre-training module;

the pre-training module takes a student model created by the student model module as a learning main body, takes data sent by the knowledge point management module and the student data management module as training data, and trains a preset initial group recommendation model to obtain a trained group recommendation model; the initial group recommendation model comprises a first neural network model and a second neural network model, the first neural network model and the second neural network model comprise an input layer, at least one hidden layer and an output layer, wherein the input layer is a student classroom feedback data sequence, the hidden layer is a neural network capable of processing sequence input, and the output layer of the first neural network model is used for outputting recommendation degrees of all knowledge points of a current course; the output layer of the second neural network model is used for outputting an evaluation value of the current classroom teaching, namely an evaluation value of a currently executed teaching behavior; the group teaching recommendation module calls a group recommendation model trained by the pre-training module, and outputs teaching recommendation information in combination with student classroom feedback of each class of the curriculum in the course teaching process and sends the teaching recommendation information to corresponding teacher users; and storing student classroom feedback collected by the student data management module; updating and training the group recommendation model based on the classroom feedback of students stored in the current period according to the configured model updating period in the course teaching process;

the output teaching recommendation information comprises a recommendation knowledge point of the next class and an evaluation value of a feedback data sequence of the current student class, wherein the recommendation knowledge point of the next class is the knowledge point with the maximum recommendation degree;

further, the knowledge point data includes: knowledge point ID, belonging course name, knowledge point introduction, knowledge point content, knowledge point difficulty coefficient, preposed knowledge point ID of the knowledge point, classroom test question matched with the knowledge point, and knowledge point related data.

Further, the student basic data includes: student's school number, name, age, gender, age and student type; the student classroom feedback comprises the following data: the test subject name, the affiliated knowledge point ID, the test subject content, the ID of the participating test student, the test result of the student and the like.

Further, the student model module simulates a group recommendation model training process of real students in the pre-training module by using a student model, and a construction model of the student model is an Ebinghao memory model, a half-life memory model or a Bayesian knowledge tracking model;

and the description of the model includes:

describing the current grasp state of the virtual student for each knowledge point;

a process describing how a virtual student transitions from one state to another by learning;

classroom feedback after learning is described.

Further, the training of the initial group recommendation model by the pre-training module comprises:

the student model created by the student model module is used as a virtual student to form a class for training;

setting course requirement information and initializing network parameters of the initial group recommendation model;

the method comprises the steps of taking virtual students in a whole class as an environment, taking a first neural network model and a second neural network model of an initial group recommendation model as intelligent agents, training the intelligent agents by adopting a near-end strategy optimization algorithm, and storing current network parameters when preset training end conditions are met to obtain the trained group recommendation model.

The course requirement information includes: the number of lessons, the passing rate, the excellent rate, the average score and the like which need to be achieved when the lessons are finished.

Further, training the agent using a near-end strategy optimization algorithm includes:

step S1: recording the initial state of the virtual student;

step S2: judging whether the first cycle number reaches a preset first maximum cycle number, if so, executing a step S3; otherwise, the following processing is executed in a loop:

step S201: resetting the virtual student status to the initial status recorded in step S1;

step S202: circularly executing the step S202-1 to the step S202-4 until the circulation times reach the preset maximum sub-circulation times; the student classroom feedback of the virtual students in each cycle, the recommendation degree of the knowledge points output by the first neural network, the evaluation value output by the second neural network model and the reward value obtained by calculating the knowledge points learned last time according to the course requirement information through the classroom feedback of all the virtual students are recorded;

step S202-1: all the virtual students participate in classroom learning, and the virtual students give classroom feedback;

step S202-2: forming a student classroom feedback data sequence by the student classroom feedback given in the step S202-1, inputting the student classroom feedback data sequence into a first neural network, and obtaining a knowledge point of the next teaching based on the output of the student classroom feedback data sequence, namely taking the knowledge point with the maximum recommendation degree as the knowledge point of the next teaching;

step S202-3: forming a student classroom feedback data sequence by the student classroom feedback given in the step S202-1, inputting the student classroom feedback data sequence into a second neural network model, and obtaining an evaluation value of the student classroom feedback data sequence based on the output of the student classroom feedback data sequence;

step S202-4: all the virtual students learn the next teaching knowledge point obtained based on the first neural network;

step S3: judging whether a preset second maximum cycle number is reached, if so, ending; otherwise, the following processing is executed in a loop:

step S301: sampling the student classroom feedback data collected in the step S2;

step S302: calculating a first objective function (namely the output loss of the first neural network) based on the sampled data, and adjusting network parameters of the first neural network according to a preset random gradient rising algorithm;

step S303: calculating a second objective function (namely the output loss of the second neural network model) based on the sampled data, and adjusting the network parameters of the second neural network model according to a preset random gradient ascending algorithm;

the recommendation and training process of the group teaching recommendation module comprises the following steps:

initializing a group recommendation model, and initializing by using the network parameters stored after the training of a pre-training module is finished;

after the teacher starts teaching, forming a student classroom feedback data sequence based on student classroom feedback of students in each class through a user terminal, and respectively inputting a first neural network model and a second neural network model of a group recommendation model; obtaining recommended teaching knowledge points of the next classroom and corresponding evaluation values based on the output of the recommended teaching knowledge points, and storing the student classroom feedback data sequence, the recommended teaching knowledge points and the evaluation values; and sending the recommended teaching knowledge points of the next classroom to the corresponding teachers;

and after the class, updating and training the group recommendation model based on historical data stored in the current updating period, wherein the historical data comprises a plurality of groups of data records, and each group of data comprises a student classroom feedback data sequence, a recommended teaching knowledge point and an evaluation value.

The technical scheme provided by the embodiment of the invention at least has the following beneficial effects:

compared with the prior art, the student data are collected through interactive methods such as voting, question answering, homework and quizzes in a classroom, a teaching plan with the maximum overall income is provided for a given student group (such as the whole class), and the overall income can be represented by a multi-objective optimization function and specifically can include (but is not limited to) passing rate, excellence rate, average rate and the like. The invention uses a deep reinforcement learning method to carry out target-oriented teaching path planning for teachers, and can process large-scale complex data. Meanwhile, the training process which takes most of the time is put before and after class, and the teacher can immediately obtain recommended teaching knowledge points through the class feedback of students in the class.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

Fig. 1 is a block diagram of a group teaching recommendation system based on deep reinforcement learning according to an embodiment of the present invention;

FIG. 2 is a teaching process data sequence chart of a group teaching recommendation system based on deep reinforcement learning according to an embodiment of the present invention;

FIG. 3 is a flow chart of a pre-training module of a group teaching recommendation system based on deep reinforcement learning according to an embodiment of the present invention;

FIG. 4 is a flowchart of a group teaching recommendation module of a group teaching recommendation system based on deep reinforcement learning according to an embodiment of the present invention;

fig. 5 is a clip function diagram of a group education recommendation system based on deep reinforcement learning according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, embodiments of the present invention will be described in detail with reference to the accompanying drawings.

The embodiment of the invention provides a group teaching recommendation system based on deep reinforcement learning, as shown in fig. 1, the system comprises: the system comprises a user terminal (used for teachers or students to log in the system), a knowledge point management module, a student data management module, a student model module, a pre-training module and a group teaching recommendation module. The specific process of realizing group teaching recommendation through data interaction among the modules comprises the following steps:

(1) a user terminal (teacher user) inputs knowledge point data through a knowledge point management module, and the knowledge point management module sends the knowledge point data to a student model module and a pre-training module group teaching recommendation module;

(2) a user terminal (student user) inputs student basic data through a student data management module, and the student data management module sends the student basic data to a student model module and a pre-training module; in the classroom, through the interaction with the user terminal, the classroom feedback of students is collected and sent to the group teaching recommendation module;

(3) the student model module creates a student model based on the currently input related information (knowledge point data and student basic data) according to a preset creating strategy and sends the student model to the pre-training module;

(4) the pre-training module takes a student model created by the student model module as a learning main body, takes data sent by the knowledge point management module and the student data management module as training data, and trains a preset initial group recommendation model to obtain a trained group recommendation model;

(5) the group teaching recommendation module calls a group recommendation model trained by the pre-training module, and in the course teaching process, teaching recommendation information is output and sent to corresponding teacher users in combination with the classroom feedback of students of each class of the curriculum; and storing student classroom feedback collected by the student data management module; and updating and training the group recommendation model based on the classroom feedback of the students stored in the current period according to the configured model updating period in the course teaching process.

In this embodiment, the knowledge point management module is configured to: receiving and storing knowledge point data (namely knowledge point information) input by an expert; the received data is provided as a data set to other modules for use. The expert refers to a teacher or a group of teachers with profound teaching experience and familiar with the course knowledge points; the knowledge point information comprises knowledge point ID, belonging course name, knowledge point introduction, knowledge point content, knowledge point difficulty coefficient, preposed knowledge point ID of the knowledge point, classroom test question matched with the knowledge point and knowledge point related data.

In this embodiment, the student data management module is configured to: receiving and storing the student basic information input by the students; collecting and storing classroom feedback data of students in a classroom test interaction mode; the received data is provided as a data set to other modules for use. The basic information of the students comprises the study numbers, names, ages, sexes, ages and types of the students; the classroom feedback data comprises test subject names, affiliated knowledge points ID, test subject contents, ID of students participating in testing and test results of the students; the data set comprises a student basic information data set and a classroom feedback data set; the data sequence generated in the teaching process in this embodiment is shown in fig. 2.

In this embodiment, the student model module is configured to create a student model based on a student basic information dataset; and simulating a group recommendation model training process in a pre-training module by using the student model. The student model is realized through an Ebinghaos memory model, and is used for describing a plurality of characteristic information:

(1) describing the current grasping state of the virtual student for each knowledge point, the formula in this embodiment is as follows:

wherein ,P_iRepresenting the probability of a student's mastery of the ith knowledge point,

representing the grasping probability of the leading knowledge point of the ith knowledge point,theta is a difficulty coefficient and is determined according to the specific conditions of students and knowledge points, D is the time from the previous learning of the knowledge point to the present learning, and S is the total times of learning the knowledge point;

(2) describing the process of how a virtual student transitions from one state to another by learning, in this embodiment by changing D and S in the above formula;

(3) describing the classroom feedback after learning, in this embodiment, a random number between 0 and 1 is sampled, if the random number is smaller than P in the above formula_iThen the question that answers the correct knowledge point is considered to be answered, and the other way round is not answered.

In this embodiment, the pre-training module is configured to train a group recommendation model through a near-end policy optimization algorithm based on a student model module before class, and the flow is shown in fig. 3. The group recommendation model consists of a recommendation neural network and a comment family neural network. The recommendation neural network is a recurrent neural network, because the feedback data of the students is a sequence of data arranged according to time, so the recommendation neural network needs to be capable of processing sequence input. The critic neural network structure is similar to the recommendation neural network, namely the critic neural network structure and the recommendation neural network both comprise an input layer, a hidden layer and an output layer, wherein the input layer is used for inputting a student classroom feedback sequence, the number of the hidden layers can be one or multiple, the number of the hidden layers of the critic neural network structure and the recommendation neural network structure can be consistent or different, the output layer is the main difference of the critic neural network structure and the recommendation neural network structure, the output layer of the recommendation neural network is used for classified output, the output layer of the critic neural network structure adopts a sofmax function, and output information is used for representing the recommendation degree of each knowledge point of the current course in the next classroom (when a recommendation result is formed, the maximum recommendation degree is taken as the recommendation result); the output layer of the critic neural network adopts a Linear function, and the output information is used for representing the rating value of the behavior at each sampling moment, namely the output of the critic neural network is the evaluation (rating value) of the current classroom teaching. The group recommendation model is trained by using a near-end Policy Optimization (PPO), and the specific training process is as follows:

(1) the student models created by the student model module are used as virtual students to form a class to participate in training, and if the number of people in the class is 20;

(2) setting course requirement information;

(3) initializing a group recommendation model, namely initializing network parameters of a recommendation neural network and a comment family neural network;

(4) taking virtual students in a whole class as an environment, recommending a neural network and a commenting family neural network as an agent, and training the agent by using a near-end strategy optimization algorithm;

(5) after training of the recommended neural network and the comment family neural network is completed, network parameters of the currently recommended neural network and the comment family neural network are stored and provided for the group teaching recommendation module to use.

As a possible implementation manner, in the training process, in (2), the course requirement information includes a number of courses of 80, a passing rate required to be achieved at the end of a course is 0.8, an excellent rate is 0.2, and a higher average is better; (3) the initialized group recommendation model in (1) comprises a neural network with two layers, and a hidden layer with 64 neurons; (4) the process of the near-end strategy optimization algorithm is as follows:

(1) recording the initial state of the virtual student;

(2) the following steps are circulated for the specified times:

(2-a) the following steps are cycled for a specified number of times:

I. resetting the virtual student state to the initial state saved in (1);

II. The following steps are circulated until the learning times reach the set class time, the test result returned by the students in each circulation is recorded, the knowledge points output by the neural network are recommended, the evaluation values output by the family neural network are commented, and the reward values obtained by calculating the knowledge points learned last time according to the course requirement information through the classroom feedback of all the students are calculated, wherein the reward value formula is as follows:

Reward＝λ₁R_p+λ₂R_e+λ₃R_a

wherein ,R_pFinger-to-pass rate, R_eMeans excellent rate, R_aMeans the evaluation grasp probability, lambda, of all students to the knowledge point₁，λ₂ and λ₃The weights of the two are represented, and the values are greater than or equal to 0, and are empirical values, which is not limited in the present invention. In this example, 5, 3, 1 are taken.

1) All the virtual students participate in the classroom test, and the virtual students return test results;

2) transmitting the classroom feedback into a recommendation neural network, and outputting a recommended knowledge point of the next teaching;

3) transmitting the classroom feedback into a comment family neural network, and outputting an evaluation value;

4) all virtual students learn and recommend knowledge points output by the neural network;

(2-b) cycle the following work by the specified number of times:

I. sampling is performed from the data collected in (2-a).

II. Calculating an objective function by using the sampled data, and selecting a random gradient ascent algorithm to train a recommended neural network, wherein the formula is as follows:

wherein ,θ_kRefers to the parameters of the recommended neural network at the k-th training, D_kRefers to a sampled data set, τ refers to a sampled data under a group of teaching paths, i.e. a complete teaching track sample (e.g. 40 courses in class, and after the complete teaching, there are 40 courses in teaching, and the teaching of the 40 courses forms a group of data, D_kComposed of many groups of tau), T is the duration of the course, pi_θ(a_t|s_t) When the expression parameter is theta, the input classroom feedback is s at the time t_tAn output of a_tThe clip function diagram is as shown in FIG. 4, i.e. the input parameters of clip () include r_t(θ)

And represents the boundary value e, if r_t(theta) is less than or equal to 1-epsilon, and clip (), then, 1-epsilon; if r_tIf (θ) is equal to or greater than 1 +. epsilon, clip (), 1 +. epsilon, and if 1-. epsilon < r_t(theta) < 1+ ∈, then clip () < r_t(theta). In this embodiment, the value of the boundary value e is 0.1.

For the dominance value of the behavior at time t, the formula is as follows:

ξ_t＝r_t+γV(s_t+1)-V(s_t)

wherein ,ξ_tIntermediate parameter representing time t, i.e. intermediate parameter xi at different times_tAccording to xi_tIs calculated by the formula (2), gamma represents the discount factor, the value in this embodiment is 0.99, T represents the total time up to now, r_tIndicating the prize value, V(s), earned at time t_t) The comment value given by the comment family neural network at the moment t is represented;

and III, calculating an objective function by using the sampled data, and selecting a random gradient ascent algorithm to train a critic neural network formula as follows:

wherein ,

refers to the parameters of the critic neural network during the k-th training,

representation based on current network parameters

The output of the family neural network (comment value) is reviewed at time t.

In the embodiment, the group teaching recommendation module is used for receiving and storing classroom feedback data of students in a classroom; giving a recommended knowledge point for teaching based on classroom feedback; the group recommendation model is further trained using the classroom feedback data after the classroom. The recommendation and training process is shown in fig. 5:

(1) initializing a group recommendation model, and initializing by using parameters stored after the training of a pre-training module is finished;

(2) the teacher starts teaching, the student gives classroom feedback and inputs a recommended neural network and a commenting family neural network, the recommended neural network outputs recommended teaching knowledge points, the commenting family neural network outputs evaluation values, and all the data are stored;

(3) and (5) circularly executing the step (2), and teaching by the teacher according to the recommended knowledge points. After a certain number of times, after class, calculating an objective function by using the data stored so far, and training the group recommendation model again;

(4) and (3) circulating until the curriculum is finished.

Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

What has been described above are merely some embodiments of the present invention. It will be apparent to those skilled in the art that various changes and modifications can be made without departing from the inventive concept thereof, and these changes and modifications can be made without departing from the spirit and scope of the invention.

Claims

1. A group teaching recommendation system based on deep reinforcement learning is characterized by comprising: the system comprises a user terminal, a knowledge point management module, a student data management module, a student model module, a pre-training module and a group teaching recommendation module;

the pre-training module takes a student model created by the student model module as a learning main body, takes data sent by the knowledge point management module and the student data management module as training data, and trains a preset initial group recommendation model to obtain a trained group recommendation model; the initial group recommendation model comprises a first neural network model and a second neural network model, the first neural network model and the second neural network model comprise an input layer, at least one hidden layer and an output layer, wherein the input layer is a student classroom feedback data sequence, the hidden layer is a neural network capable of processing sequence input, and the output layer of the first neural network model is used for outputting recommendation degrees of all knowledge points of a current course; the output layer of the second neural network model is used for outputting the evaluation value of the current classroom teaching;

the group teaching recommendation module calls a group recommendation model trained by the pre-training module, and outputs teaching recommendation information in combination with student classroom feedback of each class of the curriculum in the course teaching process and sends the teaching recommendation information to corresponding teacher users; and storing student classroom feedback collected by the student data management module; updating and training the group recommendation model based on the classroom feedback of students stored in the current period according to the configured model updating period in the course teaching process;

the output teaching recommendation information comprises a recommendation knowledge point of the next class and an evaluation value of a feedback data sequence of the current student class, wherein the recommendation knowledge point of the next class is the knowledge point with the maximum recommendation degree.

2. The group education recommendation system of claim 1, wherein the knowledge point data includes: knowledge point ID, belonging course name, knowledge point introduction, knowledge point content, knowledge point difficulty coefficient, preposed knowledge point ID of the knowledge point, classroom test question matched with the knowledge point, and knowledge point related data.

3. The group instruction recommendation system of claim 1 wherein the student base data comprises: student's school number, name, age, gender, age and student type; the student classroom feedback comprises the following data: the test subject name, the affiliated knowledge point ID, the test subject content, the ID of the participating test student, the test result of the student and the like.

4. The group education recommendation system of claim 1 wherein the student model module simulates a group recommendation model training process in a real student participation pre-training module using a student model whose construction model is an Ebingo memory model, a half-life memory model or a Bayesian knowledge tracking model;

and the description of the model includes:

classroom feedback after learning is described.

5. The group instruction recommendation system of claim 1, wherein the training of the initial group recommendation model by the pre-training module comprises:

6. The group education recommendation system of claim 5 wherein the course requirement information includes: the number of courses, and the passing rate, excellence rate and average score to be achieved at the end of the course.

7. The group instruction recommendation system of claim 1, wherein training the agent using the near-end strategy optimization algorithm comprises:

step S1: recording the initial state of the virtual student;

step S2: judging whether the first cycle number reaches a preset first maximum cycle number, if so, executing step S3; otherwise, the following processing is executed in a loop:

step S202: circularly executing the step S202-1 to the step S202-4 until the circulation times reach the preset maximum sub-circulation times; the student classroom feedback of the virtual students in each cycle, the recommendation degree of the knowledge points output by the first neural network model, the evaluation value output by the second neural network model and the reward value obtained by calculating the knowledge points learned last time according to the course requirement information through the classroom feedback of all the virtual students are recorded;

step S202-2: forming a student classroom feedback data sequence by the student classroom feedback given in the step S202-1, inputting the student classroom feedback data sequence into a first neural network model, and obtaining a knowledge point of the next teaching based on the output of the student classroom feedback data sequence, namely taking the knowledge point with the maximum recommendation degree as the knowledge point of the next teaching;

step S202-4: all the virtual students learn the next teaching knowledge point obtained based on the first neural network model;

step S302: calculating a first objective function based on the sampled data, and adjusting network parameters of a first neural network model according to a preset random gradient ascent algorithm, wherein the first objective function is used for representing the output loss of the first neural network model;

step S303: and calculating a second objective function based on the sampled data, and adjusting network parameters of the second neural network model according to a preset random gradient rise algorithm, wherein the second objective function is used for representing the output loss of the second neural network model.

8. The group instruction recommendation system of claim 7 wherein the first objective function is:

wherein ,

θ_k+1representing network parameters of the first neural network during the (k + 1) th training;

D_kindicating miningA data set of a sample;

t represents the course duration;

tau represents sampling data under a group of teaching paths;

π_θ(a_t|s_t) When the network parameter is theta, the input student classroom feedback is s at the time t_tThe output is a_tThe probability of (d);

the input parameters of the function clip () include r_t(θ) and represents the boundary value ∈ if r_t(theta) is less than or equal to 1-epsilon, and clip (), then, 1-epsilon; if r_tIf (θ) is equal to or greater than 1 +. epsilon, clip (), 1 +. epsilon, and if 1-. epsilon < r_t(theta) < 1+ ∈, then clip () < r_t(θ); wherein,

and expressing the dominant value of the behavior at the moment t, and the calculation formula is as follows:

ξ_t＝r_t+γV(s_t+1)-V(s_t)；

wherein ,ξ_tAn intermediate parameter representing the time t, gamma a preset discount factor, r_tIndicating the prize value, V(s), earned at time t_t) Representing a criticality value of the output of the second neural network model at time t;

the second objective functions are respectively:

wherein ,

representing network parameters of the second neural network model in the k +1 training time;

representation based on current network parameters

The second neural network model outputs the criticality value at time t.

9. The group education recommendation system of claim 1 wherein the recommendation and training process of the group education recommendation module is:

10. The group instruction recommendation system of claim 1 wherein the first and second neural network model hidden layers are long-short term memory recurrent neural networks.