CN112784154A

CN112784154A - Online teaching recommendation system with data enhancement

Info

Publication number: CN112784154A
Application number: CN202011625667.2A
Authority: CN
Inventors: 左琳; 刘念伯; 杨腾杰; 杨梅乙; 邹源甦
Original assignee: University of Electronic Science and Technology of China
Current assignee: University of Electronic Science and Technology of China
Priority date: 2020-12-31
Filing date: 2020-12-31
Publication date: 2021-05-11
Anticipated expiration: 2040-12-31
Also published as: CN112784154B

Abstract

The invention discloses an online teaching recommendation system with data enhancement, and belongs to the technical field of information. The system comprises a data collection module, a data set enhancement module, a post-session exercise recommendation module and a course content recommendation module, wherein the data collection module is used for receiving basic information of students and constructing a real data set; the data enhancement module generates a large-capacity virtual data set according to the real data set constructed by the data collection module; the post-lesson exercise recommending module and the course content recommending module respectively train respective recommending models based on the virtual data sets, and according to specific learning conditions, after a student finishes learning content of one unit, the post-lesson exercise recommending module generates recommended post-lesson exercises and pushes the recommended post-lesson exercises to the student; and the course content recommending module generates and pushes recommended course learning content to the student after the student finishes the learning content of the specified multiple units. The invention is used for online teaching, and solves the problem of lack of training data in the prior education recommendation technology.

Description

Online teaching recommendation system with data enhancement

Technical Field

The invention belongs to the technical field of information, and particularly relates to an online teaching recommendation technology with data enhancement.

Background

The number of on-line teaching sites is increasing, and the learning results are usually checked by watching a complete set of learning videos and then conducting post-session exercises or tests. Nowadays, the demand for teaching quality is getting bigger and bigger, and personalized teaching becomes an important demand. Personalized teaching generally refers to an education mode for customizing the contents of education targets, education plans, tutoring schemes and the like for students by collecting the learning process data of the students and integrating the academic abilities and learning characteristics of the students. Compared with online learning, online learning has the defect that no real teacher exists, so that some intelligent teaching auxiliary systems are generated, model construction is carried out by collecting student data, and then the system carries out personalized recommendation on learning related contents for students. Models such as a support vector machine, a neural network, stepwise regression and the like all achieve certain application effects, and are mainly reflected in the aspects of performance prediction, science prediction and the like.

However, these models also face a common limitation, which is the limitation of the data set. The utility of a model trained by machine learning depends heavily on the volume and quality of the data set used for training. If the quality of the data set is not high, a stable model cannot be trained, and if the capacity of the data set is too small, the generalization capability of the trained model is insufficient. Therefore, to realize accurate personalized recommendation, it is necessary to obtain a better data set in addition to finding a more excellent model.

Data collection in the education field is always a difficult point, data of early education research mainly comes from research, such as questionnaires, but the questionnaires lack elasticity and are easy to misunderstand, most of questionnaires are designed with response ranges in advance by questionnaire designers, so that the respondents are limited in answering and may miss more detailed and deep information, or scientific research teams obtain the data through long-term experimental observation, time and labor are wasted, and certain influence is caused on teaching. With the rapid development of education informatization, various online education platforms are increasing day by day, and a convenient channel is provided for data collection. But data collection also faces problems of curriculum hotness, number of students, class withdrawal rate, sparse user behavior, and the like. Such as a new class or a new teaching law being online, may be difficult to attract enough students to attend the class, or may have little interactive learning behavior, such low quality data may make assessment and analysis of the class or teaching law very difficult. Meanwhile, the privacy problem exists, and many users do not want behavior data of themselves to be collected and stored. Meanwhile, most data collected by the website are unstructured, secondary processing needs to be carried out on the data when the unstructured data are required to be converted into usable data, and for example, the answer data of students and related test questions need to be manually labeled with knowledge points, so that time and labor are wasted.

The difficulty in obtaining experimental data of the education industry greatly limits the research development of the education industry. From the perspective of simulation data, a small data set is collected, a reasonable student behavior simulator is constructed to generate student behavior data fitting reality, the format of the original data set is maintained to enhance the original data set, the method can be directly applied to evaluation of various teaching models or optimization of recommendation algorithms, and feasibility of education research and teaching innovation based on data driving is greatly improved.

The data generation refers to the generation of some virtual data based on the existing model or data class, the generated data should conform to the characteristics of the original model or data class as much as possible to achieve the effect of falseness, but is different from the duplicated data, and the generated data needs to have enough differentiation from the data in the original data class as much as possible while satisfying the characteristics of the data class. Data generation based on artificial intelligence is mainly applied to picture generation. The main methods are based on generation of a countermeasure Network (GAN), and wgan (wasserstein GAN), ctgan (systematic Term GAN), and the like are derived by continuous improvement.

In the prior art, chinese patent application publication No. CN108711138A discloses a gray-scale picture colorization method based on generation of a countermeasure network, and in the generation of a framework of the countermeasure network, the goal is to make the trained network generate a high-resolution and high-quality picture. Firstly, a sample is transmitted into a generated confrontation network to start training, and after the generated confrontation network is stably trained, the resolution of a generated image is improved by using PGGAN. And then, WGAN-GP is added into the network to improve the native countermeasure network, so that the problems of gradient instability and mode collapse are solved, and the process of generating the optimization of the countermeasure network is improved. Finally, the description restriction function of the CGAN is added to the network, which can generate pictures with specified styles according to the description conditions. But this technique does not extend the scope of use of generative countermeasure networks to other areas. Meanwhile, a single network generates a virtual result, but in the field of education, the data involved in general research comprises student characteristic information, learning contents and learning results of students, and if only one network is used for generating all the contents, a large error is caused.

Disclosure of Invention

The invention aims to: aiming at the existing problems, the on-line teaching recommendation system with data enhancement is provided, and the learning content and the after-class exercises are automatically recommended according to the conditions of students when the students learn on line. The method for generating the virtual students is used for generating the virtual education data set, the problems that the data set is rare in sample, difficult to acquire, high in acquisition cost, protected by personal privacy and the like are solved, the recommended models are trained through a large amount of virtual data, the students are helped to consolidate the learned contents and further learn in online learning, and the method is an auxiliary system for online teaching.

The invention relates to an online teaching recommendation system with data enhancement, which comprises: the system comprises a data collection module, a data set enhancement module, a post-session exercise recommendation module and a course content recommendation module;

the data collection module is used for receiving and storing student basic information input by students, and collecting and storing learning content information of the students and behavior data of the students after the students agree; constructing a real data set based on the basic information of the students, the learning content information of the students and the behavior data of the students;

the data enhancement module generates a virtual data set according to the real data set constructed by the data collection module, wherein the data capacity of the virtual data set is greater than that of the real data set; wherein, the generation processing process of each piece of virtual data in the virtual data set is as follows:

the method comprises the following steps: the student information items included in the real data set are used as templates to obtain the characteristics of the real data set, and the characteristics are divided into three categories: student characteristics, learning content characteristics and learning behavior characteristics;

step two: training a virtual student generator by using the GAN and combining student characteristic data in the real data set;

step three: training a virtual learning content generator by using the GAN and combining with learning content characteristic data in the real data set; .

Step four: training a learning behavior generator of the student by using the CTGAN and the student characteristic data and the learning behavior characteristic data in the real data set;

step five: respectively inputting noise into the trained virtual student generator, the trained virtual learning content generator and the trained learning behavior generator of the students to sequentially obtain the characteristics of the virtual students, the virtual learning content and the learning behaviors of the students;

step six: combining the characteristics of the virtual students, the virtual learning content and the learning behaviors of the students to obtain a piece of virtual data, and adding the virtual data into a virtual data set;

the post-lesson exercise recommendation module is used for training a preset post-lesson exercise recommendation model based on the virtual data set constructed by the data enhancement module to obtain a trained post-lesson exercise recommendation model; after the student finishes the learning content of one unit, based on the learning content and the learning behavior of the student, extracting the after-class exercise recommendation characteristics matched with the input of the after-class exercise recommendation model, inputting the after-class exercise recommendation characteristics into a trained after-class exercise recommendation module, generating recommended after-class exercises and pushing the recommended after-class exercises to the student;

the course content recommendation module is used for training a preset post-course content recommendation model based on the virtual data set constructed by the data enhancement module to obtain a trained post-course content recommendation model; after the student finishes the learning contents of a plurality of appointed units, extracting the post-class content recommendation characteristics matched with the input of the post-class content recommendation model based on the learning contents and the learning behaviors of the student, inputting the post-class content recommendation characteristics into a trained post-class content recommendation module, generating recommended course learning contents and pushing the recommended course learning contents to the student;

for example, the teaching content is divided into chapters, each chapter includes a plurality of sections, the recommendation processing of the after-class problem recommendation module corresponds to each section, and the recommendation processing of the course content recommendation module corresponds to each chapter. And the recommended course learning content can be set as a section or an extension content suggested to consolidate learning in the section.

Further, the basic information of the student comprises: nationality, subject history, age, gender, student type, student score, student certificate, etc. of the student; the learning content information includes: course ID, course difficulty, after-class problem ID, after-class problem difficulty, etc.; the learning behavior comprises: the interaction times of courses, the number of days for accessing the courses, the times of playing videos, the number of learning chapters, the number of posts sent by forums, the accuracy rate of after-class exercises and the like; the data set content is the basic information of the student and the corresponding learning content and learning behavior of the student.

Further, the student characteristics include: nationality, subject history, age, gender, student type, student score, student certificate, etc. of the student; the learning content features include: course ID, course difficulty, after-class problem ID, after-class problem difficulty, etc.; the learning behavior characteristics include: the interaction times of the courses, the number of days for accessing the courses, the times of playing videos, the number of learning chapters, the number of posts sent by forums, the accuracy rate of after-class exercises and the like.

In summary, due to the adoption of the technical scheme, the invention has the beneficial effects that:

compared with the prior work, the invention realizes the data generation work in the education field for the first time, trains the virtual student generator and the learning behavior generator of the student through the confrontation generation network, realizes the generation work of the virtual data set, applies the generated data to the course content recommendation and the training of the after-class exercise recommendation model, well helps the student consolidate the learned knowledge points and expand the knowledge, and solves the problem of difficult training caused by difficult data acquisition and data shortage of the student in the past education recommendation model.

Drawings

FIG. 1 is a block diagram of an online instructional recommendation system with data enhancement in accordance with the present invention;

FIG. 2 is a flow chart of a method for generating virtual students, learning content, and learning behaviors provided by the present invention;

FIG. 3 is a schematic diagram of a partitioned data set in a method for generating virtual students, learning content and learning behaviors according to the present invention;

FIG. 4 is a schematic diagram of a training virtual student generator provided by the present invention;

fig. 5 is a schematic diagram of a training virtual student learning behavior generator provided by the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in further detail with reference to the following embodiments and accompanying drawings.

As shown in fig. 1, an online teaching recommendation system with data enhancement mainly includes the following functional modules: the system comprises a data collection module, a data enhancement module, a post-lesson exercise recommendation module and a course content recommendation module.

Wherein, the data collection module is mainly used for: receiving and storing basic student information input by students; acquiring and storing learning content information of students and behavior data of the students after the students agree; the collected data are integrated into one data set.

In this embodiment, the basic information of the student includes the living city of the student, the academic calendar, the age, the sex, the student certificate, the learning score, the registration time, the user level, the attention count, the number of the joined groups, and the number of the created groups; the learning content information comprises course ID, course difficulty, learning stage, after-class exercise ID and after-class exercise difficulty; the learning behaviors of the students comprise the login times during the course, the learning days during the course, the final login time, the course score, the video playing times, the learning chapters, the number of posts from forums, the number of answering questions, the number of after-class exercises and the accuracy of the after-class exercises with different difficulties.

Referring to fig. 2, the data enhancement module is configured to generate a virtual data set (virtual data set) with a larger capacity based on the data set (real data set) generated by the data collection module. The specific treatment steps are as follows:

the method comprises the following steps: and taking the data set generated by the data collection module as a template, and dividing the characteristics of the data set into student characteristics, learning content characteristics and learning behavior characteristics.

Step two: and training a virtual student characteristic data generator by using the GAN and combining the student characteristic data in the real data.

Step three: and training a virtual learning content generator by using the GAN and combining the learning content in the real data.

Step four: and training the learning behavior generator of the student by using the CTGAN and the student characteristics and the learning behavior data in the real data.

Step five: after the training of the virtual student generator, the virtual learning content generator and the learning behavior generator of the student is completed, noise is input, and the characteristics of the virtual student, the virtual learning content and the corresponding learning behavior are obtained in sequence.

Step six: combining the virtual student characteristics, the virtual learning content and the corresponding learning behaviors in the fifth step, adding the combined virtual student characteristics, the virtual learning content and the corresponding learning behaviors into the virtual data set as a piece of virtual data, and continuously repeating the fifth step to obtain a virtual data set with a large amount of reliable data.

Further, in order to measure the reliability of the virtual data set, the data enhancement module of the present invention further performs the following processing:

and training a preset learning-dropping prediction model by using the real data set and the virtual data set respectively, obtaining two trained learning-dropping prediction models after model training is finished, inputting real student data and course data respectively, and comparing output results of the two models to obtain the reliability of the virtual data set. The model for predicting the science dropping can adopt a network structure based on a convolutional neural network, the learning behavior of the students is input, and the probability of the science dropping of the students is output.

In this embodiment, in the method for generating virtual students, virtual learning content and learning behavior, the characteristics of the students in the first step include the living cities, academic calendars, ages, sexes, student certificates, learning scores, registration time, user grades, attention counts, participation groups and creation groups of the students; the course characteristics comprise course ID, course difficulty, learning stage, after-course exercise ID and after-course exercise difficulty; the learning behaviors of the students comprise the login times during the course, the learning days during the course, the final login time, the course score, the video playing times, the learning chapters, the number of posts from forums, the number of answering questions, the number of after-class exercises and the accuracy of the after-class exercises with different difficulties.

In the above-mentioned student characteristics, the student certificate refers to the number of course passing certificates obtained before the student account is created, the learning score refers to the number of courses in all previous studies, the points obtained by answering questions, participating in tests, learning courses and the like are focused, the focused number refers to the number of teachers focused by students, the focused number refers to the number of other students focusing on the students, the group refers to a learning group, the students can randomly join or create, the number of people joining the group refers to the number of people joining the group which is larger than ten, the number of people creating the group refers to the number of people creating the group which is larger than ten, the course difficulty is divided into four types of 'simple', 'common', 'difficult' and 'unset', the problem difficulty after class is divided into four types of 'simple', 'common', 'difficult' and 'expanded', the learning stage is divided into three types of 'first stage', 'second stage' and 'third stage', and the learning stage of the course.

As shown in fig. 3, in the first step of the method for generating virtual students, virtual learning content, and learning behavior in the present embodiment, when dividing features of a real data set, it is necessary to carefully determine the relevance between the features, and the number of a student is obviously unrelated to the learning result of the student, and therefore is not required by the virtual data set when being put into other information. Meanwhile, the natural attributes of the student, such as gender, education level, etc., which have been determined before class, should be put into the student characteristics. The basic attributes of the course, such as difficulty of the course, should be placed in the course characteristics. Attributes related to the course teaching process, such as login times, question answering times, etc., should be put into the learning behavior characteristics as important indexes for observing the learning behavior. It is clearly recognized based on common sense that the learning behavior characteristics of the student are obviously influenced by the basic characteristics of the student.

As shown in fig. 4, in the second step of the method for generating virtual students, learning content and learning behavior in the present embodiment, the process flow of the GAN training virtual student generator is as follows:

(S31) setting up the GAN network.

Setting a generator network G with noise data as input and virtual student characteristics as output, and using theta as network parameter_GRepresenting birth;

setting a network D of judgers, inputting student characteristics and outputting probability that the data is real data, using Sigmoid activation function in the last layer of judgers, using theta as network parameter_D；

The specific network structures of the generator network G and the determiner network D may adopt any conventional network structure, and the present invention does not limit the network structures.

(S32) inputting data, and performing deep learning training on GAN network parameters until convergence (i.e. meeting the preset training end condition).

(S32-1) randomly sampling k noise data Z from the normal distribution₁，Z₂，…，Z_kAnd k pieces of real data X₁，X₂，…，X_kGenerating k virtual data G (Z) with k noises via a generator network G₁)，G(Z₂)，…，G(Z_k)；

Using true data X ═ X₁，X₂，…，X_k) And sampling the generated dummy data G ═ G (Z)₁)，G(Z₂)，…，G(Z_k) Update network parameters of the arbiter network D):

wherein Adam refers to Adam optimization algorithm, γ represents learning rate (preset value), which is set to 0.0002 in this embodiment,

is expressed in the pair theta_DDerivation, D () representing the output of the discriminator network D and G () representing the output of the generator network G;

(S32-2) randomly taking k noises Z₁，Z₂，…，Z_kGenerating k virtual data G (Z) over a generator network G₁)，G(Z₂)，…，G(Z_k) And updating the network parameters of the generator network G:

wherein,

is expressed in the pair theta_GDerivation is carried out;

(S32-3) the above steps (S32-1) and (S32-2) are cyclically performed until a preset training end condition is satisfied, for example, the number of cycles (training times) reaches a preset upper limit.

(S33) the trained generator network G is saved as a trained virtual student generator.

As shown in fig. 5, in step three of the method for generating virtual students, learning contents, and learning behaviors in the present embodiment, the flow of the GAN training virtual learning content generator is substantially the same as that in step two, except that the real data source is changed from student data to learning content data.

In the fourth step of the method for generating virtual students, learning content and learning behaviors, the learning behavior generator process flow of the CTGAN training virtual students is as follows:

(S41) setting up the CTGAN network.

Setting a generator network G, inputting student characteristics and learning content, outputting learning behavior, and setting network parameters as theta_g；

Setting a network D of judgers, inputting student characteristics, learning contents and learning behaviors, outputting probability of judging input data as real data, and setting network parameters as theta_d。

(S42) inputting data, and performing deep learning training on the CTGAN network parameters until convergence (i.e. meeting the preset training end condition).

(S42-1) executing the network parameter theta to the decider network D by a plurality of cycles_dCarrying out repeated iteration updating;

wherein the network parameter θ_dEach iteration update comprises b times of first sampling processing, and each first sampling processing specifically comprises the following steps:

real dataset based distribution P_rFrom P_rThe sampling is carried out at random to obtain samples x which are x to P_rWherein, the sample x contains the characteristics of the student, the learning content and the learning behavior; and data distribution P based on student characteristics and learning content_zFrom P_zSampling at medium random to obtain a sample z, namely z-P_zAnd generating a learning behavior G (z) through the generator network G, i.e. G () represents the output of the generator network G;

obtaining data f (z) ═ z (z, g (z)) based on learning behavior g (z) and sample z, and obtaining data f (z) ═ z (z, g (z)) based on learning behavior g (z) and sample z;

according to the formula

Obtaining estimated samples

Wherein, the random sampling belongs to the E to U [0,1 ]]；

According to the formula

Calculating the loss function L of the i-th training⁽ⁱ⁾；

Where D () represents the output of the arbiter network D;

E[]the mathematical expectation is represented by the mathematical expectation,

presentation pair

Derivation is carried out;

where D (x ') and D (x') represent the respective outputs of the arbiter network D after adding the dropout layer and inputting x twice, and the function D () represents the Euclidean distance between the two arguments, ω₁、ω₂Representing two preset weights;

according to the formula

Calculating a loss function L corresponding to the ith training in the execution of the loop⁽ⁱ⁾(ii) a Wherein i is 1,2, …, b;

d () represents the output of the arbiter network D;

E[]indicating mathematical expectations, with subscripts used to define variables, e.g.

Representing data about estimated samples

The mathematical expectation of (a) is that,

presentation pair

Derivation is carried out;

where D (x ') and D (x') represent the respective outputs of the discriminator network D after inputting x twice after adding a dropout layer (for reducing neural network overfitting), the function D () represents the Euclidean distance between the two parameters,

representing a distribution P about compliance_rThe expected value of sample x of (a);

ω₁、ω₂representing two preset weights;

based on b times of first sampling processing, the network parameter theta is processed_dAnd (3) performing iterative updating:

wherein Adam represents an Adam optimization algorithm, gamma represents a learning rate, b represents a sampling number,

is expressed in the pair theta_DDerivation is carried out;

(S42-2) from P_zB times of middle random sampling to obtain z⁽ⁱ⁾Where i ═ 1,2, …, b;

respectively mixing each sample z⁽ⁱ⁾Input generator network G, obtaining its output G (z)⁽ⁱ⁾) Thereby obtaining F (z)⁽ⁱ⁾)＝(z⁽ⁱ⁾，G(z⁽ⁱ⁾))；

Then F (z)⁽ⁱ⁾) Respectively input into a decision device network D to obtain the output D (F (z)⁽ⁱ⁾)；

And generates the network parameter theta of the generator network G according to the following updating mode_gUpdating:

(S42-3) the above steps (S42-1) and (S42-2) are cyclically performed until a preset training end condition is satisfied, for example, the number of cycles reaches a preset upper limit.

(S43) the trained generator network G is used as a learning behavior generator for the trained virtual students and saved.

That is, in the present embodiment, in the training process of the CTGAN network, the purpose of training the decider is to be able to distinguish between the virtual data and the real data, and the purpose of the generator is to be able to make the decider unable to distinguish between the virtual data and the real data. The game and the update are continuously carried out on the two, and finally a proper generator is obtained. The training sequence is that the judger is trained for a plurality of times, and then the generator is updated in a small scale, and the process is continuously circulated. This is because the training direction of the generator is based on the result of the decider, and a suitable generator can be trained only if the performance of the decider is sufficient.

The post-lesson problem recommendation module is used for: training a preset recommendation model by using a data set generated by a data enhancement module; and then using the trained recommendation model as a post-session exercise recommender, generating recommended post-session exercises according to the conditions of the students by the post-session exercise recommender, and pushing the recommended post-session exercises to the students. In this embodiment, the student receives a question pushed by the system after learning of each section of content is completed, and the system pushes the next question according to the recommendation model after the learning is completed.

In this embodiment, the post-lesson exercise recommendation module uses a stepwise regression model to perform post-lesson exercise recommendation, inputs the characteristics and the course content of the student and the existing learning behavior data of the student, outputs the prediction accuracy rates of the four post-lesson exercise difficulties, and selects the difficulty with the first prediction accuracy rate lower than 80% from the simple, normal, difficult and extended problems in sequence, and then randomly extracts ten post-lesson exercises with the difficulty from the question library and pushes the ten post-lesson exercises to the student.

The stepwise regression model is a method in multiple linear regression analysis, introducing variables into the model one by one, performing F test after introducing an explanatory variable, performing t test on the selected explanatory variables one by one, and deleting the originally introduced explanatory variables when the introduced explanatory variables become inconspicuous due to introduction of the explained variables later. To ensure that only significant variables are contained in the regression equation before each new variable is introduced. The method comprises the following specific steps of firstly carrying out unary linear regression analysis on each independent variable and a dependent variable one by one, sequencing the independent variables according to the influence degree on the dependent variable, then sequentially introducing the independent variables according to the sequence from large to small of the influence degree, checking the independent variables and a regression equation when a new independent variable is introduced, and if the independent variables are obvious, removing the independent variables and the regression equation if the independent variables are not obvious until no new independent variable can be introduced.

The course content recommending module is used for: training a recommendation model by using a data set generated by a data enhancement module; and using the trained recommendation model as a course content recommender, generating recommended course learning content according to the student condition based on the course content recommender, and pushing the recommended course learning content to the student. In this embodiment, the student receives the recommended course learning content pushed by the system after finishing learning of each large chapter of content. The learning content is a section or an extension content suggested to consolidate learning in the section.

In this embodiment, the course content recommending module uses a stepwise regression model to recommend the course content, inputs the characteristics and the course content of the student and the existing learning behavior data of the student, and outputs the prediction accuracy of the after-school exercises with the difficulty of "normal" of each measure in the section, the system pushes the measures with the prediction accuracy lower than 60% to the student to consolidate the learning, and if all the measures are higher than 60%, the student is recommended to learn the extension content of the section.

While the invention has been described with reference to specific embodiments, any feature disclosed in this specification may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise; all of the disclosed features, or all of the method or process steps, may be combined in any combination, except mutually exclusive features and/or steps.

Claims

1. An online teaching recommendation system with data enhancement, comprising: a data collection module, a data set enhancement module, a post-session exercise recommendation module and a course content recommendation module,

the data collection module is used for receiving and storing the student basic information input by the students, and collecting and storing the learning content information of the students and the behavior data of the students after the students agree; constructing a real data set based on the basic information of the students, the learning content information of the students and the learning behavior data of the students;

the course content recommendation module is used for training a preset post-course content recommendation model based on the virtual data set constructed by the data enhancement module to obtain a trained post-course content recommendation model; and after the student finishes the learning content of a plurality of appointed units, extracting the post-class content recommendation characteristics matched with the input of the post-class content recommendation model based on the learning content and the learning behavior of the student, inputting the extracted post-class content recommendation characteristics into the trained post-class content recommendation module, generating the recommended course learning content and pushing the recommended course learning content to the student.

2. The system of claim 1, wherein the student basic information, the student learning content information, and the student learning behavior data are respectively:

the basic information of the student comprises: nationality, subject history, age, gender, student type, student score, student certificate of the student;

the learning content information of the student includes: course ID, course difficulty, after-class problem ID, after-class problem difficulty;

the student's learning line data includes: course interaction times, course access days, video playing times, learning chapters, forum posts and post-course exercises.

3. The system of claim 2, wherein the student characteristics, learning content characteristics, and learning behavior characteristics are, respectively:

the student characteristics include: nationality, subject history, age, gender, student type, student score, student certificate of the student;

the learning content features include: course ID, course difficulty, after-class problem ID, after-class problem difficulty;

learning behavior features include: the interaction times of the courses, the number of days for accessing the courses, the times of playing videos, the number of learning chapters, the number of posts sent by forums and the accuracy rate of after-class exercises.

4. The system of claim 2 or 3, wherein the post-session problem recommendation module generates and pushes recommended course learning content to the student after the student completes each bar of learning content; the course content recommending module generates recommended course learning content and pushes the recommended course learning content to the student after the student does not complete the learning of one chapter, wherein the recommended course learning content is a section or an extended content suggested to consolidate the learning in the chapter.

5. The system according to any one of claims 1 to 4, wherein training out the virtual student generator using the GAN in combination with the student characteristic data in the real dataset is specifically:

(1) setting a GAN network:

setting a generator network G, inputting noise data and outputting virtual student characteristics;

setting a judger network D, inputting student characteristics, outputting probability that the currently input student characteristics are real data, and adopting a Sigmoid activation function in the last layer of the judger network D;

(2) inputting data, and carrying out deep learning training on GAN network parameters:

(2-1) randomly sampling k noise data Z from normal distribution₁，Z₂，…，Z_kAnd k pieces of real data X₁，X₂，…，X_kGenerating k virtual data G (Z) from k noises through a generator network G₁)，G(Z₂)，…，G(Z_k)；

For the network parameters of the decider network D, theta_DUpdating:

wherein Adam represents an Adam optimization algorithm, gamma represents a learning rate,

(2-2) randomly taking k noises Z₁，Z₂，…，Z_kGenerating k virtual data G (Z) over a generator network G₁)，G(Z₂)，…，G(Z_k) And updating the network parameter theta of the generator network G_G：

Wherein,

is expressed in the pair theta_GDerivation is carried out;

(2-3) circularly executing the steps (2-1) and (2-2) until a preset training end condition is met;

(3) and taking the trained generator network G as a trained virtual student generator.

6. The system according to any one of claims 1 to 4, wherein the learning behavior generator for training out the student using the student characteristic data and the learning behavior characteristic data in the CTGAN and the truth data set is specifically:

step 1: setting a CTGAN network:

setting a generator network G, inputting student characteristics and learning content characteristics, and outputting learning behavior characteristics;

setting a judger network D, inputting student characteristics, learning content characteristics and learning behavior characteristics, and outputting the probability of judging input data as real data;

step 2, inputting data, and performing deep learning training on CTGAN network parameters:

step 2-1: performing the network parameter theta to the decider network D by multiple cycles_dCarrying out repeated iteration updating;

wherein the network parameter θ_dEach iteration update comprises b times of first sampling processing, and each time of first sampling processing specifically comprises the following steps:

distribution P based on real dataset_rFrom P_rSampling randomly to obtain a sample x, wherein the sample x comprises student characteristics, learning content characteristics and learning behavior characteristics;

data distribution P based on student characteristics and learning content characteristics_zFrom P_zSampling randomly to obtain a sample z, and generating a learning behavior G (z) through a generator network G;

based on learning behavior g (z) and sample z, data f (z) ═ z (z, g (z));

according to the formula

Obtaining estimated samples

Wherein, the random sampling belongs to the E to U [0,1 ]]；

According to the formula

Calculating the loss function L of the ith training in the loop execution⁽ⁱ⁾；

Wherein i is 1,2, …, b;

d () represents the output of the arbiter network D;

E[]the mathematical expectation is represented by the mathematical expectation,

presentation pair

Derivation is carried out;

wherein D (x ') and D (x') represent the respective outputs of the discriminator network D after inputting x twice after adding the dropout layer, and the function D () represents the Euclidean distance between the two arguments;

ω₁、ω₂representing two preset weights;

is expressed in the pair theta_DDerivation is carried out;

step 2-2: distribution of P from data_zB times of middle random sampling to obtain a sample set z⁽ⁱ⁾Where i ═ 1,2, …, b;

respectively mixing each sample z⁽ⁱ⁾Input generator network G, obtaining its output G (z)⁽ⁱ⁾) Thereby obtaining F (z)⁽ⁱ⁾)＝(z⁽ⁱ⁾,G(z⁽ⁱ⁾))；

Network parameter θ for generator network G_gUpdating:

step 2-3: circularly executing the steps 2-1 and 2-2 until a preset training end condition is met;

and step 3: and taking the trained generator network G as a learning behavior generator of the trained virtual student.

7. The system of claim 1, wherein the data enhancement module further performs the following:

and training a preset learning-dropping prediction model by respectively using the real data set and the virtual data set, obtaining two trained learning-dropping prediction models after model training is finished, respectively inputting real student data and course data, and comparing output results of the two models to obtain a reliability result of the virtual data set.