CN112784154A - Online teaching recommendation system with data enhancement - Google Patents

Online teaching recommendation system with data enhancement Download PDF

Info

Publication number
CN112784154A
CN112784154A CN202011625667.2A CN202011625667A CN112784154A CN 112784154 A CN112784154 A CN 112784154A CN 202011625667 A CN202011625667 A CN 202011625667A CN 112784154 A CN112784154 A CN 112784154A
Authority
CN
China
Prior art keywords
student
learning
data
virtual
network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202011625667.2A
Other languages
Chinese (zh)
Other versions
CN112784154B (en
Inventor
左琳
刘念伯
杨腾杰
杨梅乙
邹源甦
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Electronic Science and Technology of China
Original Assignee
University of Electronic Science and Technology of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Electronic Science and Technology of China filed Critical University of Electronic Science and Technology of China
Priority to CN202011625667.2A priority Critical patent/CN112784154B/en
Publication of CN112784154A publication Critical patent/CN112784154A/en
Application granted granted Critical
Publication of CN112784154B publication Critical patent/CN112784154B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/20Education
    • G06Q50/205Education administration or guidance

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Strategic Management (AREA)
  • Computing Systems (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Biomedical Technology (AREA)
  • Artificial Intelligence (AREA)
  • Educational Administration (AREA)
  • Educational Technology (AREA)
  • Molecular Biology (AREA)
  • Tourism & Hospitality (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Human Resources & Organizations (AREA)
  • Marketing (AREA)
  • Primary Health Care (AREA)
  • General Business, Economics & Management (AREA)
  • Economics (AREA)
  • Electrically Operated Instructional Devices (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses an online teaching recommendation system with data enhancement, and belongs to the technical field of information. The system comprises a data collection module, a data set enhancement module, a post-session exercise recommendation module and a course content recommendation module, wherein the data collection module is used for receiving basic information of students and constructing a real data set; the data enhancement module generates a large-capacity virtual data set according to the real data set constructed by the data collection module; the post-lesson exercise recommending module and the course content recommending module respectively train respective recommending models based on the virtual data sets, and according to specific learning conditions, after a student finishes learning content of one unit, the post-lesson exercise recommending module generates recommended post-lesson exercises and pushes the recommended post-lesson exercises to the student; and the course content recommending module generates and pushes recommended course learning content to the student after the student finishes the learning content of the specified multiple units. The invention is used for online teaching, and solves the problem of lack of training data in the prior education recommendation technology.

Description

Online teaching recommendation system with data enhancement
Technical Field
The invention belongs to the technical field of information, and particularly relates to an online teaching recommendation technology with data enhancement.
Background
The number of on-line teaching sites is increasing, and the learning results are usually checked by watching a complete set of learning videos and then conducting post-session exercises or tests. Nowadays, the demand for teaching quality is getting bigger and bigger, and personalized teaching becomes an important demand. Personalized teaching generally refers to an education mode for customizing the contents of education targets, education plans, tutoring schemes and the like for students by collecting the learning process data of the students and integrating the academic abilities and learning characteristics of the students. Compared with online learning, online learning has the defect that no real teacher exists, so that some intelligent teaching auxiliary systems are generated, model construction is carried out by collecting student data, and then the system carries out personalized recommendation on learning related contents for students. Models such as a support vector machine, a neural network, stepwise regression and the like all achieve certain application effects, and are mainly reflected in the aspects of performance prediction, science prediction and the like.
However, these models also face a common limitation, which is the limitation of the data set. The utility of a model trained by machine learning depends heavily on the volume and quality of the data set used for training. If the quality of the data set is not high, a stable model cannot be trained, and if the capacity of the data set is too small, the generalization capability of the trained model is insufficient. Therefore, to realize accurate personalized recommendation, it is necessary to obtain a better data set in addition to finding a more excellent model.
Data collection in the education field is always a difficult point, data of early education research mainly comes from research, such as questionnaires, but the questionnaires lack elasticity and are easy to misunderstand, most of questionnaires are designed with response ranges in advance by questionnaire designers, so that the respondents are limited in answering and may miss more detailed and deep information, or scientific research teams obtain the data through long-term experimental observation, time and labor are wasted, and certain influence is caused on teaching. With the rapid development of education informatization, various online education platforms are increasing day by day, and a convenient channel is provided for data collection. But data collection also faces problems of curriculum hotness, number of students, class withdrawal rate, sparse user behavior, and the like. Such as a new class or a new teaching law being online, may be difficult to attract enough students to attend the class, or may have little interactive learning behavior, such low quality data may make assessment and analysis of the class or teaching law very difficult. Meanwhile, the privacy problem exists, and many users do not want behavior data of themselves to be collected and stored. Meanwhile, most data collected by the website are unstructured, secondary processing needs to be carried out on the data when the unstructured data are required to be converted into usable data, and for example, the answer data of students and related test questions need to be manually labeled with knowledge points, so that time and labor are wasted.
The difficulty in obtaining experimental data of the education industry greatly limits the research development of the education industry. From the perspective of simulation data, a small data set is collected, a reasonable student behavior simulator is constructed to generate student behavior data fitting reality, the format of the original data set is maintained to enhance the original data set, the method can be directly applied to evaluation of various teaching models or optimization of recommendation algorithms, and feasibility of education research and teaching innovation based on data driving is greatly improved.
The data generation refers to the generation of some virtual data based on the existing model or data class, the generated data should conform to the characteristics of the original model or data class as much as possible to achieve the effect of falseness, but is different from the duplicated data, and the generated data needs to have enough differentiation from the data in the original data class as much as possible while satisfying the characteristics of the data class. Data generation based on artificial intelligence is mainly applied to picture generation. The main methods are based on generation of a countermeasure Network (GAN), and wgan (wasserstein GAN), ctgan (systematic Term GAN), and the like are derived by continuous improvement.
In the prior art, chinese patent application publication No. CN108711138A discloses a gray-scale picture colorization method based on generation of a countermeasure network, and in the generation of a framework of the countermeasure network, the goal is to make the trained network generate a high-resolution and high-quality picture. Firstly, a sample is transmitted into a generated confrontation network to start training, and after the generated confrontation network is stably trained, the resolution of a generated image is improved by using PGGAN. And then, WGAN-GP is added into the network to improve the native countermeasure network, so that the problems of gradient instability and mode collapse are solved, and the process of generating the optimization of the countermeasure network is improved. Finally, the description restriction function of the CGAN is added to the network, which can generate pictures with specified styles according to the description conditions. But this technique does not extend the scope of use of generative countermeasure networks to other areas. Meanwhile, a single network generates a virtual result, but in the field of education, the data involved in general research comprises student characteristic information, learning contents and learning results of students, and if only one network is used for generating all the contents, a large error is caused.
Disclosure of Invention
The invention aims to: aiming at the existing problems, the on-line teaching recommendation system with data enhancement is provided, and the learning content and the after-class exercises are automatically recommended according to the conditions of students when the students learn on line. The method for generating the virtual students is used for generating the virtual education data set, the problems that the data set is rare in sample, difficult to acquire, high in acquisition cost, protected by personal privacy and the like are solved, the recommended models are trained through a large amount of virtual data, the students are helped to consolidate the learned contents and further learn in online learning, and the method is an auxiliary system for online teaching.
The invention relates to an online teaching recommendation system with data enhancement, which comprises: the system comprises a data collection module, a data set enhancement module, a post-session exercise recommendation module and a course content recommendation module;
the data collection module is used for receiving and storing student basic information input by students, and collecting and storing learning content information of the students and behavior data of the students after the students agree; constructing a real data set based on the basic information of the students, the learning content information of the students and the behavior data of the students;
the data enhancement module generates a virtual data set according to the real data set constructed by the data collection module, wherein the data capacity of the virtual data set is greater than that of the real data set; wherein, the generation processing process of each piece of virtual data in the virtual data set is as follows:
the method comprises the following steps: the student information items included in the real data set are used as templates to obtain the characteristics of the real data set, and the characteristics are divided into three categories: student characteristics, learning content characteristics and learning behavior characteristics;
step two: training a virtual student generator by using the GAN and combining student characteristic data in the real data set;
step three: training a virtual learning content generator by using the GAN and combining with learning content characteristic data in the real data set; .
Step four: training a learning behavior generator of the student by using the CTGAN and the student characteristic data and the learning behavior characteristic data in the real data set;
step five: respectively inputting noise into the trained virtual student generator, the trained virtual learning content generator and the trained learning behavior generator of the students to sequentially obtain the characteristics of the virtual students, the virtual learning content and the learning behaviors of the students;
step six: combining the characteristics of the virtual students, the virtual learning content and the learning behaviors of the students to obtain a piece of virtual data, and adding the virtual data into a virtual data set;
the post-lesson exercise recommendation module is used for training a preset post-lesson exercise recommendation model based on the virtual data set constructed by the data enhancement module to obtain a trained post-lesson exercise recommendation model; after the student finishes the learning content of one unit, based on the learning content and the learning behavior of the student, extracting the after-class exercise recommendation characteristics matched with the input of the after-class exercise recommendation model, inputting the after-class exercise recommendation characteristics into a trained after-class exercise recommendation module, generating recommended after-class exercises and pushing the recommended after-class exercises to the student;
the course content recommendation module is used for training a preset post-course content recommendation model based on the virtual data set constructed by the data enhancement module to obtain a trained post-course content recommendation model; after the student finishes the learning contents of a plurality of appointed units, extracting the post-class content recommendation characteristics matched with the input of the post-class content recommendation model based on the learning contents and the learning behaviors of the student, inputting the post-class content recommendation characteristics into a trained post-class content recommendation module, generating recommended course learning contents and pushing the recommended course learning contents to the student;
for example, the teaching content is divided into chapters, each chapter includes a plurality of sections, the recommendation processing of the after-class problem recommendation module corresponds to each section, and the recommendation processing of the course content recommendation module corresponds to each chapter. And the recommended course learning content can be set as a section or an extension content suggested to consolidate learning in the section.
Further, the basic information of the student comprises: nationality, subject history, age, gender, student type, student score, student certificate, etc. of the student; the learning content information includes: course ID, course difficulty, after-class problem ID, after-class problem difficulty, etc.; the learning behavior comprises: the interaction times of courses, the number of days for accessing the courses, the times of playing videos, the number of learning chapters, the number of posts sent by forums, the accuracy rate of after-class exercises and the like; the data set content is the basic information of the student and the corresponding learning content and learning behavior of the student.
Further, the student characteristics include: nationality, subject history, age, gender, student type, student score, student certificate, etc. of the student; the learning content features include: course ID, course difficulty, after-class problem ID, after-class problem difficulty, etc.; the learning behavior characteristics include: the interaction times of the courses, the number of days for accessing the courses, the times of playing videos, the number of learning chapters, the number of posts sent by forums, the accuracy rate of after-class exercises and the like.
In summary, due to the adoption of the technical scheme, the invention has the beneficial effects that:
compared with the prior work, the invention realizes the data generation work in the education field for the first time, trains the virtual student generator and the learning behavior generator of the student through the confrontation generation network, realizes the generation work of the virtual data set, applies the generated data to the course content recommendation and the training of the after-class exercise recommendation model, well helps the student consolidate the learned knowledge points and expand the knowledge, and solves the problem of difficult training caused by difficult data acquisition and data shortage of the student in the past education recommendation model.
Drawings
FIG. 1 is a block diagram of an online instructional recommendation system with data enhancement in accordance with the present invention;
FIG. 2 is a flow chart of a method for generating virtual students, learning content, and learning behaviors provided by the present invention;
FIG. 3 is a schematic diagram of a partitioned data set in a method for generating virtual students, learning content and learning behaviors according to the present invention;
FIG. 4 is a schematic diagram of a training virtual student generator provided by the present invention;
fig. 5 is a schematic diagram of a training virtual student learning behavior generator provided by the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in further detail with reference to the following embodiments and accompanying drawings.
As shown in fig. 1, an online teaching recommendation system with data enhancement mainly includes the following functional modules: the system comprises a data collection module, a data enhancement module, a post-lesson exercise recommendation module and a course content recommendation module.
Wherein, the data collection module is mainly used for: receiving and storing basic student information input by students; acquiring and storing learning content information of students and behavior data of the students after the students agree; the collected data are integrated into one data set.
In this embodiment, the basic information of the student includes the living city of the student, the academic calendar, the age, the sex, the student certificate, the learning score, the registration time, the user level, the attention count, the number of the joined groups, and the number of the created groups; the learning content information comprises course ID, course difficulty, learning stage, after-class exercise ID and after-class exercise difficulty; the learning behaviors of the students comprise the login times during the course, the learning days during the course, the final login time, the course score, the video playing times, the learning chapters, the number of posts from forums, the number of answering questions, the number of after-class exercises and the accuracy of the after-class exercises with different difficulties.
Referring to fig. 2, the data enhancement module is configured to generate a virtual data set (virtual data set) with a larger capacity based on the data set (real data set) generated by the data collection module. The specific treatment steps are as follows:
the method comprises the following steps: and taking the data set generated by the data collection module as a template, and dividing the characteristics of the data set into student characteristics, learning content characteristics and learning behavior characteristics.
Step two: and training a virtual student characteristic data generator by using the GAN and combining the student characteristic data in the real data.
Step three: and training a virtual learning content generator by using the GAN and combining the learning content in the real data.
Step four: and training the learning behavior generator of the student by using the CTGAN and the student characteristics and the learning behavior data in the real data.
Step five: after the training of the virtual student generator, the virtual learning content generator and the learning behavior generator of the student is completed, noise is input, and the characteristics of the virtual student, the virtual learning content and the corresponding learning behavior are obtained in sequence.
Step six: combining the virtual student characteristics, the virtual learning content and the corresponding learning behaviors in the fifth step, adding the combined virtual student characteristics, the virtual learning content and the corresponding learning behaviors into the virtual data set as a piece of virtual data, and continuously repeating the fifth step to obtain a virtual data set with a large amount of reliable data.
Further, in order to measure the reliability of the virtual data set, the data enhancement module of the present invention further performs the following processing:
and training a preset learning-dropping prediction model by using the real data set and the virtual data set respectively, obtaining two trained learning-dropping prediction models after model training is finished, inputting real student data and course data respectively, and comparing output results of the two models to obtain the reliability of the virtual data set. The model for predicting the science dropping can adopt a network structure based on a convolutional neural network, the learning behavior of the students is input, and the probability of the science dropping of the students is output.
In this embodiment, in the method for generating virtual students, virtual learning content and learning behavior, the characteristics of the students in the first step include the living cities, academic calendars, ages, sexes, student certificates, learning scores, registration time, user grades, attention counts, participation groups and creation groups of the students; the course characteristics comprise course ID, course difficulty, learning stage, after-course exercise ID and after-course exercise difficulty; the learning behaviors of the students comprise the login times during the course, the learning days during the course, the final login time, the course score, the video playing times, the learning chapters, the number of posts from forums, the number of answering questions, the number of after-class exercises and the accuracy of the after-class exercises with different difficulties.
In the above-mentioned student characteristics, the student certificate refers to the number of course passing certificates obtained before the student account is created, the learning score refers to the number of courses in all previous studies, the points obtained by answering questions, participating in tests, learning courses and the like are focused, the focused number refers to the number of teachers focused by students, the focused number refers to the number of other students focusing on the students, the group refers to a learning group, the students can randomly join or create, the number of people joining the group refers to the number of people joining the group which is larger than ten, the number of people creating the group refers to the number of people creating the group which is larger than ten, the course difficulty is divided into four types of 'simple', 'common', 'difficult' and 'unset', the problem difficulty after class is divided into four types of 'simple', 'common', 'difficult' and 'expanded', the learning stage is divided into three types of 'first stage', 'second stage' and 'third stage', and the learning stage of the course.
As shown in fig. 3, in the first step of the method for generating virtual students, virtual learning content, and learning behavior in the present embodiment, when dividing features of a real data set, it is necessary to carefully determine the relevance between the features, and the number of a student is obviously unrelated to the learning result of the student, and therefore is not required by the virtual data set when being put into other information. Meanwhile, the natural attributes of the student, such as gender, education level, etc., which have been determined before class, should be put into the student characteristics. The basic attributes of the course, such as difficulty of the course, should be placed in the course characteristics. Attributes related to the course teaching process, such as login times, question answering times, etc., should be put into the learning behavior characteristics as important indexes for observing the learning behavior. It is clearly recognized based on common sense that the learning behavior characteristics of the student are obviously influenced by the basic characteristics of the student.
As shown in fig. 4, in the second step of the method for generating virtual students, learning content and learning behavior in the present embodiment, the process flow of the GAN training virtual student generator is as follows:
(S31) setting up the GAN network.
Setting a generator network G with noise data as input and virtual student characteristics as output, and using theta as network parameterGRepresenting birth;
setting a network D of judgers, inputting student characteristics and outputting probability that the data is real data, using Sigmoid activation function in the last layer of judgers, using theta as network parameterD
The specific network structures of the generator network G and the determiner network D may adopt any conventional network structure, and the present invention does not limit the network structures.
(S32) inputting data, and performing deep learning training on GAN network parameters until convergence (i.e. meeting the preset training end condition).
(S32-1) randomly sampling k noise data Z from the normal distribution1,Z2,…,ZkAnd k pieces of real data X1,X2,…,XkGenerating k virtual data G (Z) with k noises via a generator network G1),G(Z2),…,G(Zk);
Using true data X ═ X1,X2,…,Xk) And sampling the generated dummy data G ═ G (Z)1),G(Z2),…,G(Zk) Update network parameters of the arbiter network D):
Figure BDA0002879206120000061
wherein Adam refers to Adam optimization algorithm, γ represents learning rate (preset value), which is set to 0.0002 in this embodiment,
Figure BDA0002879206120000062
is expressed in the pair thetaDDerivation, D () representing the output of the discriminator network D and G () representing the output of the generator network G;
(S32-2) randomly taking k noises Z1,Z2,…,ZkGenerating k virtual data G (Z) over a generator network G1),G(Z2),…,G(Zk) And updating the network parameters of the generator network G:
Figure BDA0002879206120000063
wherein,
Figure BDA0002879206120000064
is expressed in the pair thetaGDerivation is carried out;
(S32-3) the above steps (S32-1) and (S32-2) are cyclically performed until a preset training end condition is satisfied, for example, the number of cycles (training times) reaches a preset upper limit.
(S33) the trained generator network G is saved as a trained virtual student generator.
As shown in fig. 5, in step three of the method for generating virtual students, learning contents, and learning behaviors in the present embodiment, the flow of the GAN training virtual learning content generator is substantially the same as that in step two, except that the real data source is changed from student data to learning content data.
In the fourth step of the method for generating virtual students, learning content and learning behaviors, the learning behavior generator process flow of the CTGAN training virtual students is as follows:
(S41) setting up the CTGAN network.
Setting a generator network G, inputting student characteristics and learning content, outputting learning behavior, and setting network parameters as thetag
Setting a network D of judgers, inputting student characteristics, learning contents and learning behaviors, outputting probability of judging input data as real data, and setting network parameters as thetad
The specific network structures of the generator network G and the determiner network D may adopt any conventional network structure, and the present invention does not limit the network structures.
(S42) inputting data, and performing deep learning training on the CTGAN network parameters until convergence (i.e. meeting the preset training end condition).
(S42-1) executing the network parameter theta to the decider network D by a plurality of cyclesdCarrying out repeated iteration updating;
wherein the network parameter θdEach iteration update comprises b times of first sampling processing, and each first sampling processing specifically comprises the following steps:
real dataset based distribution PrFrom PrThe sampling is carried out at random to obtain samples x which are x to PrWherein, the sample x contains the characteristics of the student, the learning content and the learning behavior; and data distribution P based on student characteristics and learning contentzFrom PzSampling at medium random to obtain a sample z, namely z-PzAnd generating a learning behavior G (z) through the generator network G, i.e. G () represents the output of the generator network G;
obtaining data f (z) ═ z (z, g (z)) based on learning behavior g (z) and sample z, and obtaining data f (z) ═ z (z, g (z)) based on learning behavior g (z) and sample z;
according to the formula
Figure BDA0002879206120000071
Obtaining estimated samples
Figure BDA0002879206120000072
Wherein, the random sampling belongs to the E to U [0,1 ]];
According to the formula
Figure BDA0002879206120000073
Calculating the loss function L of the i-th training(i)
Where D () represents the output of the arbiter network D;
Figure BDA0002879206120000074
E[]the mathematical expectation is represented by the mathematical expectation,
Figure BDA0002879206120000075
presentation pair
Figure BDA0002879206120000076
Derivation is carried out;
Figure BDA0002879206120000077
where D (x ') and D (x') represent the respective outputs of the arbiter network D after adding the dropout layer and inputting x twice, and the function D () represents the Euclidean distance between the two arguments, ω1、ω2Representing two preset weights;
according to the formula
Figure BDA0002879206120000078
Calculating a loss function L corresponding to the ith training in the execution of the loop(i)(ii) a Wherein i is 1,2, …, b;
d () represents the output of the arbiter network D;
Figure BDA0002879206120000079
E[]indicating mathematical expectations, with subscripts used to define variables, e.g.
Figure BDA00028792061200000710
Representing data about estimated samples
Figure BDA00028792061200000711
The mathematical expectation of (a) is that,
Figure BDA00028792061200000712
presentation pair
Figure BDA00028792061200000713
Derivation is carried out;
Figure BDA0002879206120000081
where D (x ') and D (x') represent the respective outputs of the discriminator network D after inputting x twice after adding a dropout layer (for reducing neural network overfitting), the function D () represents the Euclidean distance between the two parameters,
Figure BDA0002879206120000082
representing a distribution P about compliancerThe expected value of sample x of (a);
ω1、ω2representing two preset weights;
based on b times of first sampling processing, the network parameter theta is processeddAnd (3) performing iterative updating:
Figure BDA0002879206120000083
wherein Adam represents an Adam optimization algorithm, gamma represents a learning rate, b represents a sampling number,
Figure BDA0002879206120000084
is expressed in the pair thetaDDerivation is carried out;
(S42-2) from PzB times of middle random sampling to obtain z(i)Where i ═ 1,2, …, b;
respectively mixing each sample z(i)Input generator network G, obtaining its output G (z)(i)) Thereby obtaining F (z)(i))=(z(i),G(z(i)));
Then F (z)(i)) Respectively input into a decision device network D to obtain the output D (F (z)(i));
And generates the network parameter theta of the generator network G according to the following updating modegUpdating:
Figure BDA0002879206120000085
(S42-3) the above steps (S42-1) and (S42-2) are cyclically performed until a preset training end condition is satisfied, for example, the number of cycles reaches a preset upper limit.
(S43) the trained generator network G is used as a learning behavior generator for the trained virtual students and saved.
That is, in the present embodiment, in the training process of the CTGAN network, the purpose of training the decider is to be able to distinguish between the virtual data and the real data, and the purpose of the generator is to be able to make the decider unable to distinguish between the virtual data and the real data. The game and the update are continuously carried out on the two, and finally a proper generator is obtained. The training sequence is that the judger is trained for a plurality of times, and then the generator is updated in a small scale, and the process is continuously circulated. This is because the training direction of the generator is based on the result of the decider, and a suitable generator can be trained only if the performance of the decider is sufficient.
The post-lesson problem recommendation module is used for: training a preset recommendation model by using a data set generated by a data enhancement module; and then using the trained recommendation model as a post-session exercise recommender, generating recommended post-session exercises according to the conditions of the students by the post-session exercise recommender, and pushing the recommended post-session exercises to the students. In this embodiment, the student receives a question pushed by the system after learning of each section of content is completed, and the system pushes the next question according to the recommendation model after the learning is completed.
In this embodiment, the post-lesson exercise recommendation module uses a stepwise regression model to perform post-lesson exercise recommendation, inputs the characteristics and the course content of the student and the existing learning behavior data of the student, outputs the prediction accuracy rates of the four post-lesson exercise difficulties, and selects the difficulty with the first prediction accuracy rate lower than 80% from the simple, normal, difficult and extended problems in sequence, and then randomly extracts ten post-lesson exercises with the difficulty from the question library and pushes the ten post-lesson exercises to the student.
The stepwise regression model is a method in multiple linear regression analysis, introducing variables into the model one by one, performing F test after introducing an explanatory variable, performing t test on the selected explanatory variables one by one, and deleting the originally introduced explanatory variables when the introduced explanatory variables become inconspicuous due to introduction of the explained variables later. To ensure that only significant variables are contained in the regression equation before each new variable is introduced. The method comprises the following specific steps of firstly carrying out unary linear regression analysis on each independent variable and a dependent variable one by one, sequencing the independent variables according to the influence degree on the dependent variable, then sequentially introducing the independent variables according to the sequence from large to small of the influence degree, checking the independent variables and a regression equation when a new independent variable is introduced, and if the independent variables are obvious, removing the independent variables and the regression equation if the independent variables are not obvious until no new independent variable can be introduced.
The course content recommending module is used for: training a recommendation model by using a data set generated by a data enhancement module; and using the trained recommendation model as a course content recommender, generating recommended course learning content according to the student condition based on the course content recommender, and pushing the recommended course learning content to the student. In this embodiment, the student receives the recommended course learning content pushed by the system after finishing learning of each large chapter of content. The learning content is a section or an extension content suggested to consolidate learning in the section.
In this embodiment, the course content recommending module uses a stepwise regression model to recommend the course content, inputs the characteristics and the course content of the student and the existing learning behavior data of the student, and outputs the prediction accuracy of the after-school exercises with the difficulty of "normal" of each measure in the section, the system pushes the measures with the prediction accuracy lower than 60% to the student to consolidate the learning, and if all the measures are higher than 60%, the student is recommended to learn the extension content of the section.
While the invention has been described with reference to specific embodiments, any feature disclosed in this specification may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise; all of the disclosed features, or all of the method or process steps, may be combined in any combination, except mutually exclusive features and/or steps.

Claims (7)

1. An online teaching recommendation system with data enhancement, comprising: a data collection module, a data set enhancement module, a post-session exercise recommendation module and a course content recommendation module,
the data collection module is used for receiving and storing the student basic information input by the students, and collecting and storing the learning content information of the students and the behavior data of the students after the students agree; constructing a real data set based on the basic information of the students, the learning content information of the students and the learning behavior data of the students;
the data enhancement module generates a virtual data set according to the real data set constructed by the data collection module, wherein the data capacity of the virtual data set is greater than that of the real data set; wherein, the generation processing process of each piece of virtual data in the virtual data set is as follows:
the method comprises the following steps: the student information items included in the real data set are used as templates to obtain the characteristics of the real data set, and the characteristics are divided into three categories: student characteristics, learning content characteristics and learning behavior characteristics;
step two: training a virtual student generator by using the GAN and combining student characteristic data in the real data set;
step three: training a virtual learning content generator by using the GAN and combining with learning content characteristic data in the real data set; .
Step four: training a learning behavior generator of the student by using the CTGAN and the student characteristic data and the learning behavior characteristic data in the real data set;
step five: respectively inputting noise into the trained virtual student generator, the trained virtual learning content generator and the trained learning behavior generator of the students to sequentially obtain the characteristics of the virtual students, the virtual learning content and the learning behaviors of the students;
step six: combining the characteristics of the virtual students, the virtual learning content and the learning behaviors of the students to obtain a piece of virtual data, and adding the virtual data into a virtual data set;
the post-lesson exercise recommendation module is used for training a preset post-lesson exercise recommendation model based on the virtual data set constructed by the data enhancement module to obtain a trained post-lesson exercise recommendation model; after the student finishes the learning content of one unit, based on the learning content and the learning behavior of the student, extracting the after-class exercise recommendation characteristics matched with the input of the after-class exercise recommendation model, inputting the after-class exercise recommendation characteristics into a trained after-class exercise recommendation module, generating recommended after-class exercises and pushing the recommended after-class exercises to the student;
the course content recommendation module is used for training a preset post-course content recommendation model based on the virtual data set constructed by the data enhancement module to obtain a trained post-course content recommendation model; and after the student finishes the learning content of a plurality of appointed units, extracting the post-class content recommendation characteristics matched with the input of the post-class content recommendation model based on the learning content and the learning behavior of the student, inputting the extracted post-class content recommendation characteristics into the trained post-class content recommendation module, generating the recommended course learning content and pushing the recommended course learning content to the student.
2. The system of claim 1, wherein the student basic information, the student learning content information, and the student learning behavior data are respectively:
the basic information of the student comprises: nationality, subject history, age, gender, student type, student score, student certificate of the student;
the learning content information of the student includes: course ID, course difficulty, after-class problem ID, after-class problem difficulty;
the student's learning line data includes: course interaction times, course access days, video playing times, learning chapters, forum posts and post-course exercises.
3. The system of claim 2, wherein the student characteristics, learning content characteristics, and learning behavior characteristics are, respectively:
the student characteristics include: nationality, subject history, age, gender, student type, student score, student certificate of the student;
the learning content features include: course ID, course difficulty, after-class problem ID, after-class problem difficulty;
learning behavior features include: the interaction times of the courses, the number of days for accessing the courses, the times of playing videos, the number of learning chapters, the number of posts sent by forums and the accuracy rate of after-class exercises.
4. The system of claim 2 or 3, wherein the post-session problem recommendation module generates and pushes recommended course learning content to the student after the student completes each bar of learning content; the course content recommending module generates recommended course learning content and pushes the recommended course learning content to the student after the student does not complete the learning of one chapter, wherein the recommended course learning content is a section or an extended content suggested to consolidate the learning in the chapter.
5. The system according to any one of claims 1 to 4, wherein training out the virtual student generator using the GAN in combination with the student characteristic data in the real dataset is specifically:
(1) setting a GAN network:
setting a generator network G, inputting noise data and outputting virtual student characteristics;
setting a judger network D, inputting student characteristics, outputting probability that the currently input student characteristics are real data, and adopting a Sigmoid activation function in the last layer of the judger network D;
(2) inputting data, and carrying out deep learning training on GAN network parameters:
(2-1) randomly sampling k noise data Z from normal distribution1,Z2,…,ZkAnd k pieces of real data X1,X2,…,XkGenerating k virtual data G (Z) from k noises through a generator network G1),G(Z2),…,G(Zk);
For the network parameters of the decider network D, thetaDUpdating:
Figure FDA0002879206110000021
wherein Adam represents an Adam optimization algorithm, gamma represents a learning rate,
Figure FDA0002879206110000022
is expressed in the pair thetaDDerivation, D () representing the output of the discriminator network D and G () representing the output of the generator network G;
(2-2) randomly taking k noises Z1,Z2,…,ZkGenerating k virtual data G (Z) over a generator network G1),G(Z2),…,G(Zk) And updating the network parameter theta of the generator network GG
Figure FDA0002879206110000023
Wherein,
Figure FDA0002879206110000024
is expressed in the pair thetaGDerivation is carried out;
(2-3) circularly executing the steps (2-1) and (2-2) until a preset training end condition is met;
(3) and taking the trained generator network G as a trained virtual student generator.
6. The system according to any one of claims 1 to 4, wherein the learning behavior generator for training out the student using the student characteristic data and the learning behavior characteristic data in the CTGAN and the truth data set is specifically:
step 1: setting a CTGAN network:
setting a generator network G, inputting student characteristics and learning content characteristics, and outputting learning behavior characteristics;
setting a judger network D, inputting student characteristics, learning content characteristics and learning behavior characteristics, and outputting the probability of judging input data as real data;
step 2, inputting data, and performing deep learning training on CTGAN network parameters:
step 2-1: performing the network parameter theta to the decider network D by multiple cyclesdCarrying out repeated iteration updating;
wherein the network parameter θdEach iteration update comprises b times of first sampling processing, and each time of first sampling processing specifically comprises the following steps:
distribution P based on real datasetrFrom PrSampling randomly to obtain a sample x, wherein the sample x comprises student characteristics, learning content characteristics and learning behavior characteristics;
data distribution P based on student characteristics and learning content characteristicszFrom PzSampling randomly to obtain a sample z, and generating a learning behavior G (z) through a generator network G;
based on learning behavior g (z) and sample z, data f (z) ═ z (z, g (z));
according to the formula
Figure FDA0002879206110000031
Obtaining estimated samples
Figure FDA0002879206110000032
Wherein, the random sampling belongs to the E to U [0,1 ]];
According to the formula
Figure FDA0002879206110000033
Calculating the loss function L of the ith training in the loop execution(i)
Wherein i is 1,2, …, b;
d () represents the output of the arbiter network D;
Figure FDA0002879206110000034
E[]the mathematical expectation is represented by the mathematical expectation,
Figure FDA0002879206110000035
presentation pair
Figure FDA0002879206110000036
Derivation is carried out;
Figure FDA0002879206110000037
wherein D (x ') and D (x') represent the respective outputs of the discriminator network D after inputting x twice after adding the dropout layer, and the function D () represents the Euclidean distance between the two arguments;
ω1、ω2representing two preset weights;
based on b times of first sampling processing, the network parameter theta is processeddAnd (3) performing iterative updating:
Figure FDA0002879206110000038
wherein Adam represents an Adam optimization algorithm, gamma represents a learning rate, b represents a sampling number,
Figure FDA0002879206110000039
is expressed in the pair thetaDDerivation is carried out;
step 2-2: distribution of P from datazB times of middle random sampling to obtain a sample set z(i)Where i ═ 1,2, …, b;
respectively mixing each sample z(i)Input generator network G, obtaining its output G (z)(i)) Thereby obtaining F (z)(i))=(z(i),G(z(i)));
Then F (z)(i)) Respectively input into a decision device network D to obtain the output D (F (z)(i));
Network parameter θ for generator network GgUpdating:
Figure FDA0002879206110000041
step 2-3: circularly executing the steps 2-1 and 2-2 until a preset training end condition is met;
and step 3: and taking the trained generator network G as a learning behavior generator of the trained virtual student.
7. The system of claim 1, wherein the data enhancement module further performs the following:
and training a preset learning-dropping prediction model by respectively using the real data set and the virtual data set, obtaining two trained learning-dropping prediction models after model training is finished, respectively inputting real student data and course data, and comparing output results of the two models to obtain a reliability result of the virtual data set.
CN202011625667.2A 2020-12-31 2020-12-31 Online teaching recommendation system with data enhancement Active CN112784154B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011625667.2A CN112784154B (en) 2020-12-31 2020-12-31 Online teaching recommendation system with data enhancement

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011625667.2A CN112784154B (en) 2020-12-31 2020-12-31 Online teaching recommendation system with data enhancement

Publications (2)

Publication Number Publication Date
CN112784154A true CN112784154A (en) 2021-05-11
CN112784154B CN112784154B (en) 2022-03-15

Family

ID=75754470

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011625667.2A Active CN112784154B (en) 2020-12-31 2020-12-31 Online teaching recommendation system with data enhancement

Country Status (1)

Country Link
CN (1) CN112784154B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114595923A (en) * 2022-01-11 2022-06-07 电子科技大学 Group teaching recommendation system based on deep reinforcement learning
CN118132858A (en) * 2024-05-08 2024-06-04 江西财经大学 AI-based personalized learning recommendation method and system

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109657156A (en) * 2019-01-22 2019-04-19 杭州师范大学 A kind of personalized recommendation method generating confrontation network based on circulation
CN111191122A (en) * 2019-12-20 2020-05-22 重庆邮电大学 Learning resource recommendation system based on user portrait
US20200234606A1 (en) * 2019-01-22 2020-07-23 International Business Machines Corporation Personalized educational planning based on user learning profile
CN111931062A (en) * 2020-08-28 2020-11-13 腾讯科技(深圳)有限公司 Training method and related device of information recommendation model
CN112016767A (en) * 2020-10-09 2020-12-01 北京高思博乐教育科技股份有限公司 Dynamic planning method and device for learning route
CN112085560A (en) * 2020-08-19 2020-12-15 王娟 Intelligent education method and system based on cloud computing

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109657156A (en) * 2019-01-22 2019-04-19 杭州师范大学 A kind of personalized recommendation method generating confrontation network based on circulation
US20200234606A1 (en) * 2019-01-22 2020-07-23 International Business Machines Corporation Personalized educational planning based on user learning profile
CN111191122A (en) * 2019-12-20 2020-05-22 重庆邮电大学 Learning resource recommendation system based on user portrait
CN112085560A (en) * 2020-08-19 2020-12-15 王娟 Intelligent education method and system based on cloud computing
CN111931062A (en) * 2020-08-28 2020-11-13 腾讯科技(深圳)有限公司 Training method and related device of information recommendation model
CN112016767A (en) * 2020-10-09 2020-12-01 北京高思博乐教育科技股份有限公司 Dynamic planning method and device for learning route

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
孟俊: "融合深度学习的课程推荐方法研究", 《中国优秀硕士学位论文全文数据库》 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114595923A (en) * 2022-01-11 2022-06-07 电子科技大学 Group teaching recommendation system based on deep reinforcement learning
CN118132858A (en) * 2024-05-08 2024-06-04 江西财经大学 AI-based personalized learning recommendation method and system
CN118132858B (en) * 2024-05-08 2024-07-23 江西财经大学 AI-based personalized learning recommendation method and system

Also Published As

Publication number Publication date
CN112784154B (en) 2022-03-15

Similar Documents

Publication Publication Date Title
Vasileva-Stojanovska et al. Impact of satisfaction, personality and learning style on educational outcomes in a blended learning environment
Chaipidech et al. Implementation of an andragogical teacher professional development training program for boosting TPACK in STEM education
CN107423851A (en) Adaptive learning method based on learning style context aware
McCall et al. Opportunities for change: What factors influence non-traditional students to enrol in higher education?
Liu et al. Collaborative learning team formation: a cognitive modeling perspective
CN112784154B (en) Online teaching recommendation system with data enhancement
Wang et al. Competence and autonomous motivation as motivational predictors of college students’ mathematics achievement: From the perspective of self-determination theory
Chen et al. Recommendation system based on rule-space model of two-phase blue-red tree and optimized learning path with multimedia learning and cognitive assessment evaluation
Kang et al. Investigating navigational behavior patterns of students across at-risk categories within an open-ended serious game
Puspasari Innovative virtual museum conceptual model for learning enhancement during the pandemic
Saleh et al. Predicting student performance using data mining and learning analysis technique in Libyan Higher Education
Shi et al. A study on the impact of Generative Artificial Intelligence supported Situational Interactive Teaching on students’‘flow’experience and learning effectiveness—a case study of legal education in China
Daryanes et al. Creative Thinking Ability of Biology Teachers at State Senior High Schools in Pekanbaru
Huang et al. Concept assessment system integrated with a knowledge map using deep learning
Morales et al. Applying a digital learning ecosystem to increase the effectiveness of a massive open online course
Manganello et al. Gamification for promoting acceptance of an online learning environment among teachers
Chen et al. Application of mathematical modeling in analyzing and optimizing English teaching methods in vocational education
Sun et al. Strategies for the Integration of Traditional Music Culture in College Music Teaching under Information Diffusion Modeling
CN114357306A (en) Course recommendation method based on meta-relation
Schmidt et al. Cognitive expertise through repetition enhanced simulation (CERES): topographic map reading
Wang A study on the design of a deep learning model for classroom based on user participation and game chemistry processes
Jain et al. Classifying Emotional Engagement in Online Learning Via Deep Learning Architecture
Zhang Research on the Problems of Teaching Traditional Music Basic Theory in the Information Age
Hui Computer assisted design and implementation of diagnostic evaluation model in online test
Hidayati et al. An Analysis of Students' Saturation and Learning Interest when Studying Offline in Biology Subject at Darus Sholah High School in Jember

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant