CN111275239B

CN111275239B - Multi-mode-based networked teaching data analysis method and system

Info

Publication number: CN111275239B
Application number: CN201911329595.4A
Authority: CN
Inventors: 谢晖; 罗艳霞; 朱守平; 陈雪利; 詹勇华; 梁继民
Original assignee: Xidian University
Current assignee: Xidian University
Priority date: 2019-12-20
Filing date: 2019-12-20
Publication date: 2023-09-29
Anticipated expiration: 2039-12-20
Also published as: CN111275239A

Abstract

The invention belongs to the technical field of data processing, and discloses a multi-mode-based networked teaching data analysis method and system, wherein the method and system adopt a maximum information coefficient MIC for feature screening to remove irrelevant factors; after feature screening by MIC analysis, reconstructing the screened features into a feature space, and carrying out regression by using a random forest to obtain a final evaluation model; the learning analysis technology and the data mining algorithm are combined, learning ability data, physiological data and learning behavior data generated by students in a theoretical course online learning platform are integrated and analyzed, a theoretical online course learning effect evaluation model is established, learning effects of the students are evaluated, and the evaluation result is output in the forms of charts, numbers and the like by the visualization technology. The invention establishes a theory course evaluation system of multi-mode information fusion by utilizing a machine learning technology, and provides theory and technical method support for an online course learning process.

Description

Multi-mode-based networked teaching data analysis method and system

Technical Field

The invention belongs to the technical field of data processing, and particularly relates to a multi-mode-based networked teaching data analysis method and system.

Background

Currently, the closest prior art: different from the traditional face-to-face teaching mode of teachers and students, the network teaching has important significance for constructing an effective evaluation system model aiming at the network courses. At present, a great number of researches collect, measure, analyze and report learning behavior data of students by utilizing learning analysis technologies such as association analysis, regression analysis and data mining algorithm, understand and optimize teaching processes and situations, provide support for teaching decision and academic early warning, and improve teaching effects.

However, the prior art mainly relates to acquisition and analysis of student learning behavior data, and an established evaluation model is relatively fixed, so that certain evaluation prediction errors exist in application under different batches and environments. In the online education of the network, students are in quasi-separated states, and most of the students learn geographically alone, so that the students lack emotion attention of the teachers, are difficult to deeply communicate with other students, and experience no classroom feeling and integrated attribution feeling of the traditional education, thereby strengthening the autism of the students and easily causing learning fatigue; because students cannot timely obtain feedback, evaluation and excitation of teachers and classmates in the learning process, anxiety is easy to generate; the learning effect of students can be influenced along with the change of the physiological signal index of the organism in the process. In addition, the learning ability of students, such as intelligence factors, meta learning ability, inherent factors, etc., has an extremely important role in the entire learning process. Therefore, it is difficult to realize a comprehensive and accurate evaluation of the learning effect of the theoretical lesson simply depending on learning behavior data or learning ability data. Besides the problem that the evaluation standard is too single, a general course evaluation system is often formed after the whole teaching process is finished, has hysteresis, and is difficult to realize positive intervention on the learning process of students.

In summary, the problems of the prior art are:

(1) The prior art mainly relates to acquisition and analysis of student learning behavior data, and an established evaluation model is relatively fixed, so that certain evaluation prediction errors exist in different batches and environments.

(2) In the existing network online education, teachers and students are in a quasi-separated state, students are mainly studied geographically, so that the emotion attention of the teachers is lacked, the students are difficult to deeply communicate with other students, the feeling of presence and the feeling of concentration in a classroom of the traditional education are not realized, the feeling of autism of the students is enhanced, and the learning fatigue is easily caused; affecting the learning effect of the students.

(3) The prior art mainly relates to the acquisition of student learning behavior data and single analysis evaluation standard, has hysteresis, and is difficult to realize positive intervention of a student learning process.

The difficulty of solving the technical problems:

the learning ability data of the learner is obtained in the form of a questionnaire, which requires that the characteristics of the questionnaire established by the learner must be comprehensive and subjective psychological factors of the researched person should be considered.

The modeling of the multi-mode data is carried out according to the characteristics, the accuracy of the model is considered, the risk of overfitting is avoided, a strict research framework is required to be considered in the modeling process, the validity of the standard is ensured, too harsh standard cannot be set when the importance degree of each factor in the learning evaluation process is screened and quantified, otherwise, the data in a sample is easily overfitted, and the sample cannot be generalized in many cases.

No machine learning model can win long, and we also face how to find the optimal solution for the current event. In general, model fusion can improve the final predictive power more or less, and is generally not worse than the optimal sub-model. This requires that we have to consider various modeling methods using machine learning and build a fused model based on these models that is ultimately used for prediction.

The application of the evaluation model constructed based on the first batch of data to the subsequent student evaluation directly causes instability of the evaluation result, and because the learner can continuously generate dynamic big data in the learning process, the generation of the data means the richness of the feature data, but also brings complexity of feature analysis, how the influence factors of the same features of different batches on the final model construction are determined, the evaluation comparison between the models of different batches combined in different combination modes, and how to screen the quantized features according to the evaluation models established based on the different combination modes, all require that we have strict logic structures.

Meaning of solving the technical problem:

the features are the basis of modeling, good features play a very positive role in the establishment of a final model, and too poor features also lead to the result deviation of the model, so that the acquisition of learning features is very important for modeling accuracy and the application of analysis and research results of later research data. And the characteristics are effectively screened and quantized, overfitting is avoided when the model is built, and the application accuracy of the model on data outside the sample can be effectively improved. The model is optimized in real time according to big data continuously generated by a learner in the learning process, so that the prediction capability of the model can be further improved.

Disclosure of Invention

Aiming at the problems existing in the prior art, the invention provides a multi-mode-based networked teaching data analysis method and system.

The invention is realized in such a way that the method for analyzing the multi-mode-based networked teaching data comprises the following steps:

firstly, adopting the maximum information coefficient MIC to perform feature screening, and removing irrelevant factors; calculating correlation coefficients of the feature space X and the achievement space S column by column, and selecting a feature corresponding to the maximum value of the correlation coefficients as a first feature; calculating the characteristic f ₁ MIC value between other features, will L _max The corresponding feature is selected as the second feature f ₂ The method comprises the steps of carrying out a first treatment on the surface of the Removing the first feature, and repeating the steps until the most sufficient feature quantity is obtained;

secondly, after feature screening is carried out by utilizing MIC analysis, the screened features are recombined into a feature space, regression is carried out by utilizing random forests, and a final evaluation model is obtained;

thirdly, adopting a method of combining a learning analysis technology and a data mining algorithm to integrate and analyze learning ability data, physiological data and learning behavior data generated by learning of students on a theoretical course online learning platform, establishing a theoretical online course learning effect evaluation model, evaluating learning effects of the students, and outputting evaluation results in the forms of charts, numbers and the like by adopting a visualization technology.

Further, the multi-mode data feature screening of the multi-mode-based networked teaching data analysis method comprises the following steps: specific quantitative data are given out by the online learning score of the student theoretical course through online practical operation and online test before the end of each course, physiological signal data are collected and transmitted through an intelligent bracelet, and the collection average value and variance of a signal sequence are taken as characteristic x _bio The method comprises the steps of carrying out a first treatment on the surface of the Learning behavior data (x) _bah ) Completing collection through a theoretical course online learning platform; learning ability data (x) _iq ) Before the first theoretical lesson is started, the first theoretical lesson can be obtained completely in the form of an electronic questionnaire; the feature vector of each student is denoted as X _i ＝(x _bio ，x _bah ，x _iq ) The feature space of all students is denoted as x= (X ₁ ，X ₂ ，...，X _n ) ^T Where n is the number of students, the corresponding achievement space is denoted s= (S ₁ ，S ₂ ，...，S _n ) ^T . Firstly, carrying out variance normalization on all the features, and then analyzing the relation between each factor and the theoretical course learning score.

Furthermore, the multi-mode-based networked teaching data analysis method adopts the maximum information coefficient MIC to perform feature screening, and removes irrelevant factors, and comprises the following specific steps: first, calculating the correlation coefficient P= (P) of the feature space X and the achievement space S column by column ₁ ，p ₂ ，...，p _n ) The maximum value P of the correlation coefficient _max The corresponding feature is selected as the first feature, say f ₁ ＝X _k The method comprises the steps of carrying out a first treatment on the surface of the Then calculate the feature f ₁ MIC value m= (M) ₁ ，...，m _k-1 ，m _k+1 ，...，m _n ) Let l=0.5×p (i+.k) +0.5×0.5×1-M, let L _max The corresponding feature is selected as a second feature f2; and removing the first feature, and repeating the steps until the most sufficient feature quantity is obtained or the maximum value of the current MIC is smaller than a certain threshold value.

Further, the establishing of the multi-mode data initial evaluation model based on the multi-mode networked teaching data analysis method comprises the following steps: the random forest is composed of N decision trees, during training, training samples of each decision tree are obtained by Bootstrap sampling from an original training set, characteristics used during training of each node of the decision tree are also obtained by random sampling from a new characteristic space X, each decision tree carries out recursion splitting according to a judgment criterion, and after the training of the N decision trees is completed, the average value of leaf nodes where each student is located is the final regression result;

bootstrap sampling is the extraction of n samples with a put back in a set of n samples to form a data set.

Further, the implementation method of the single decision tree by adopting the recursion splitting process is as follows: the extracted sample set D forms a root node, and the sample set is split into two parts D1 and D2 according to a judgment criterion; recursively building a left subtree with a sample set D1, and building a right subtree with a sample set D2; setting a condition for stopping splitting, and marking the node as a leaf node and assigning values when the splitting cannot be continued.

Further, the specific implementation method of the judgment criterion is as follows: the regression error of the root node, i.e. the mean square error of the label values and the regression values of all samples, is calculated as:

random extraction of features X from new feature space X _u Sequencing the extracted training samples D from small to large according to the values of the features; sequentially using the performance of each student as a threshold value, dividing the sample into a left part and a right part, and then calculating the mean square sum error of the left subtree and the right subtreeDifference; the error index for a split is defined as the regression error before the split minus the regression error of the left and right subtrees after the split:

E＝E(D)-E(D ₁ )-E(D ₂ )；

if the index is maximized, the splitting is continued; stopping splitting when the depth of the manually set decision tree is reached or the calculated regression error is greater than a manually set threshold value, setting the node as a leaf node, and setting the value of the leaf node as the average value of the label values of the node sample set; so far, the training of the single decision tree is completed.

Further, the multi-mode evaluation model verification method based on the multi-mode networked teaching data analysis method comprises the following steps: firstly, grading learning effects of students: setting a gear 5 below 60 minutes; a score of 60-70 is set as 4; a score of 70-80 is set as 3; a score of 80-90 is set as 2; a score of 90-100 is set as 1; secondly, predicting a final scoring result of the student by using a final model obtained by multi-mode data, and grading according to the grading method; comparing the model with the final grading results of the four classes of students, and calculating the accuracy so as to verify the accuracy of the final model; setting the model prediction achievement as 5,4,3,2 and 1 according to the preset grading intervals <60, 60-70, 70-80, 80-90 and 90-100, setting the final achievement actually obtained by the students as 5,4,3,2 and 1 according to the grading, marking the grade of the student achievement as 1 if the grade is consistent, marking the grade of the student achievement as 0 if the grade is inconsistent, and calculating the probability of marking the grade as 1, namely the model prediction accuracy.

Furthermore, the LSTM network structure of the multimode evaluation model of the multimode-based networked teaching data analysis method is an LSTM layer after an input layer, the number of hidden layer units can be optimized through experiments, then two full-connection layers are connected, dropout is added between the two full-connection layers to improve trainability, and finally prediction of student results is completed through a regression layer.

Another object of the present invention is to provide a multi-modality based networked teaching data analysis system for implementing the multi-modality based networked teaching data analysis method, the multi-modality based networked teaching data analysis system comprising:

the feature screening module is used for carrying out feature screening by adopting the maximum information coefficient MIC to remove irrelevant factors;

the evaluation model acquisition module is used for reconstructing the screened characteristics into a characteristic space after the characteristic screening by utilizing MIC analysis, and carrying out regression by utilizing a random forest to obtain a final evaluation model;

and the evaluation result output module is used for integrating and analyzing learning ability data, physiological data and learning behavior data generated by learning of students on the theoretical course online learning platform by adopting a method of combining a learning analysis technology and a data mining algorithm, establishing a theoretical online course learning effect evaluation model, evaluating the learning effect of the students, and outputting the evaluation result in a chart and digital form by adopting a visualization technology.

The invention further aims to provide an information data processing terminal applying the multi-mode-based networked teaching data analysis method.

In summary, the invention has the advantages and positive effects that: according to the invention, through various learning behavior data provided by a theoretical course learning platform, physiological signals collected by an intelligent bracelet and learning ability information of students are fused, and a multi-mode information database of the learning process of the students is established; by utilizing a machine learning technology, a theoretical course evaluation system with multi-mode information fusion is established, and theoretical and technical method support is provided for the precision, personalized real-time evaluation and teaching scheme dynamic adjustment of an online course learning process.

Drawings

Fig. 1 is a flowchart of a multi-mode-based networked teaching data analysis method according to an embodiment of the present invention.

Fig. 2 is a schematic structural diagram of a multi-mode-based networked teaching data analysis system according to an embodiment of the present invention;

in the figure: 1. a feature screening module; 2. an evaluation model acquisition module; 3. and the evaluation result output module.

Detailed Description

The present invention will be described in further detail with reference to the following examples in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.

Aiming at the problems existing in the prior art, the invention provides a multi-mode-based networked teaching data analysis method and system, and the invention is described in detail below with reference to the accompanying drawings.

As shown in fig. 1, the method for analyzing multi-mode-based networked teaching data provided by the embodiment of the invention comprises the following steps:

s101: adopting the maximum information coefficient MIC to perform feature screening, and removing irrelevant factors; calculating correlation coefficients of the feature space X and the achievement space S column by column, and selecting a feature corresponding to the maximum value of the correlation coefficients as a first feature; calculating the characteristic f ₁ MIC value between other features, will L _max The corresponding feature is selected as the second feature f ₂ The method comprises the steps of carrying out a first treatment on the surface of the Removing the first feature, and repeating the steps until the most sufficient feature quantity is obtained;

s102: after feature screening by MIC analysis, reconstructing the screened features into a feature space, and carrying out regression by using a random forest to obtain a final evaluation model;

s103: the learning analysis technology and the data mining algorithm are combined, learning ability data, physiological data and learning behavior data generated by students in a theoretical course online learning platform are integrated and analyzed, a theoretical online course learning effect evaluation model is established, learning effects of the students are evaluated, and the evaluation result is output in the forms of charts, numbers and the like by the visualization technology.

As shown in fig. 2, the multi-mode-based networked teaching data analysis system provided by the embodiment of the invention includes:

the feature screening module 1 is used for carrying out feature screening by adopting the maximum information coefficient MIC to remove irrelevant factors;

the evaluation model acquisition module 2 is used for reconstructing the screened characteristics into a characteristic space after the characteristic screening by utilizing MIC analysis, and carrying out regression by utilizing a random forest to obtain a final evaluation model;

and the evaluation result output module 3 is used for integrating and analyzing learning ability data, physiological data and learning behavior data generated by learning the students on the theoretical course online learning platform by adopting a method of combining a learning analysis technology and a data mining algorithm, establishing a theoretical online course learning effect evaluation model, evaluating the learning effect of the students, and outputting the evaluation result in the forms of charts, numbers and the like by adopting a visualization technology.

The technical scheme of the invention is further described below with reference to specific embodiments.

1. Theory course teaching design and multi-mode data acquisition

1.1 teaching Activity design:

teaching activities are developed by means of provincial virtual simulation demonstration course projects of microbiological virtual simulation experiments built by applicant team. The whole online study was divided into 10 weeks. Week 1-3: professional basic experiments, online theoretical tests, online practical tests and self-evaluation; week 4-6: professional experiments, online theoretical tests, online real operation tests and self-evaluation; at 7-9 weeks, comprehensive innovation experiments (grouping), online theoretical tests, online actual operation tests, self-evaluation and peer evaluation; week 10: and (3) performing online theoretical assessment at the end of the period, performing online real operation assessment at the end of the period, performing self-evaluation, performing peer evaluation, and performing teacher evaluation. Through the teaching activity design, the system is used for collecting four-dimensional theoretical course achievement data and learning behavior data.

Learning ability data acquisition is completed based on a questionnaire scale:

and the first repair related course score data is directly connected with a school educational administration department score system through the project database. The learning ability data acquisition is mainly realized by independently designing a questionnaire system, and the data collection is carried out before the first course of the student starts: the intelligent evaluation relies on the global authoritative intelligence quotient test-Welch intelligent test (WechslerAdultIntelligence Scale) and related scales, large data analysis is carried out on the intelligence quotient evaluation data based on an online test platform, the weight calculation is carried out on partial test question scores in the intelligent evaluation data, and a dispersion intelligence quotient algorithm (relative intelligence quotient obtained by taking average intelligence quotient as a reference and standard deviation as a unit) is introduced, so that the evaluation result of the intelligence quotient is more accurate. The PC/mobile terminal electronic questionnaire used by the intrinsic factors and meta-learning ability factor survey data is compiled by combining actual college situation modification on the basis of referencing mature measuring tools at home and abroad and early-stage series surveys. The questionnaire investigation content is mainly divided into: gender (men and women), grade (1-4), specialty (philosophy, economics, forensics, education, literature, histories, physics, engineering, agronomy, medicine, military, management, artistry), family (one, two, three, four-wire city, county/town, country/village), parental history (illiterate, primary school, high school/middle school, college, family, study), teaching environment (very good, general, poor, very bad), achievement motivation (scale assignment), self-efficiency (scale assignment), element cognition (scale assignment), achievement attribution (scale assignment), personal nature (scale assignment), etc., with a strong comprehensiveness. Meanwhile, the element learning ability factors (achievement motivation, self-efficacy, element cognition, achievement attribution and personal property) are assigned (0-100) according to the questionnaire scale, so that the analysis modeling of the follow-up quantifiable data is facilitated.

1.2, completing learning behavior data acquisition based on a theoretical course online platform:

the learning behavior factors depend on the earlier stage research results, and 18 variables of the virtual simulation module use time, the pre-experiment pre-study time, the experiment process discussion time, the post-experiment review time, the task completion time under class, the self-evaluation, the experiment group student evaluation, the experiment comprehensive evaluation (teacher), the learning website access, the reading discussion time, the platform posting time, the online test time, the course dynamic checking, the knowledge learning expanding time, the searching tool use time and the like are selected preliminarily. As the learning behavior big data platform is a system platform which is built by a project responsible person in the early stage, an open data transmission interface exists in the system platform, variables required by the project can be automatically imported into the multi-mode database and can be output in various file forms such as excel, word and the like.

2. Multi-modal evaluation model construction based on machine learning method

2.1 multimodal data feature screening based on maximum information coefficient:

the online learning score of the student theoretical course gives specific quantitative data through online actual operation and online test before the end of each course (10 times in total). The physiological signal data is acquired and transmitted through an intelligent bracelet, and the average value and variance of signal sequence acquisition are taken as characteristic x _bio The method comprises the steps of carrying out a first treatment on the surface of the Learning behavior data (x) _bah ) Completing collection through a theoretical course online learning platform; learning ability data (x) _iq ) All available in electronic questionnaires before the first theoretical lesson was started. The feature vector of each student is denoted as X _i ＝(x _bio ，x _bah ，x _iq ) The feature space of all students is denoted as x= (X ₁ ，X ₂ ，...，X _n ) ^T Where n is the number of students, the corresponding achievement space is denoted s= (S ₁ ，S ₂ ，...，S _n ) ^T . Firstly, carrying out variance normalization on all the features, and then analyzing the relation between each factor and the theoretical course learning score.

The invention adopts the maximum information coefficient MIC to carry out feature screening and remove irrelevant factors. The method comprises the following specific steps: first, calculating the correlation coefficient P= (P) of the feature space X and the achievement space S column by column ₁ ，p ₂ ，...，p _n ) The maximum value P of the correlation coefficient _max The corresponding feature is selected as the first feature, say f ₁ ＝X _k The method comprises the steps of carrying out a first treatment on the surface of the Then calculate the feature f ₁ MIC value m= (M) ₁ ，...，m _k-1 ，m _k+1 ，...，m _n ) Let l=0.5×p (i+.k) +0.5×0.5×1-M, let L _max The corresponding feature is selected as the second feature f ₂ The method comprises the steps of carrying out a first treatment on the surface of the The first feature is removed and the above steps are repeated until the most adequate number of features (empirical values, which can be determined experimentally) is obtained, or the maximum value of the current MIC is less than a certain threshold, such as the mean value of the MIC values corresponding to the first feature.

3. Establishing a multi-mode data initial evaluation model based on random forests:

the learning analysis technology and the data mining algorithm are combined, learning ability data, physiological data and learning behavior data generated by learning a microbiological virtual simulation experiment on a theoretical course online learning platform are integrated and analyzed, a theoretical online course learning effect evaluation model is established, the learning effect of the student is evaluated, and the evaluation result is output in the forms of charts, numbers and the like by applying a visualization technology. After feature screening by MIC analysis, the screened features are recombined into a feature space, and regression is performed by using random forests to obtain a final evaluation model. The specific method comprises the following steps:

the random forest is composed of N decision trees (artificially set), during training, the N decision trees are trained in a circulating way, training samples of each decision tree are sampled from an original training set (all students), the characteristics used during training each node of the decision tree are randomly sampled from a new characteristic space X, each decision tree is subjected to recursive splitting according to a judgment criterion, and after the N decision trees are trained, the average value of leaf nodes where each student is located is the final regression result.

Bootstrap sampling is the extraction of n samples with a set of n samples back into a data set, where one sample in the original sample set may or may not occur multiple times.

The single decision tree adopts a recursion splitting process, and the specific implementation method is as follows: the extracted sample set D forms a root node, and the sample set is split into two parts D1 and D2 according to a judgment criterion; recursively building a left subtree with a sample set D1, and building a right subtree with a sample set D2; a condition for stopping splitting is set, and when splitting cannot be continued, a node is marked as a leaf node and assigned.

The specific implementation method of the decision criterion (assuming that the decision starts from the root node) is as follows: the regression error of the root node, i.e. the mean square sum error of the label values of all samples and the regression value (the average of all sample label values of this node) is calculated as:

random extraction of features X from new feature space X _u Sequencing the extracted training samples D (student achievements) according to the value of the characteristic from small to large; sequentially taking the score of each student as a threshold value, dividing the sample into a left part and a right part, and then calculating the mean square sum error of the left subtree and the right subtree; the error index for a split is defined as the regression error before the split minus the regression error of the left and right subtrees after the split:

E＝E(D)-E(D ₁ )-E(D ₂ )；

if the index is maximized, the splitting is continued; and stopping splitting when the depth of the artificially set decision tree is reached or the calculated regression error is larger than the artificially set threshold value, setting the node as a leaf node, and setting the value of the leaf node as the average value of the label values of the node sample set. So far, the training of the single decision tree is completed.

4. Practice verification of a multi-mode evaluation model, and verification of prediction accuracy of model prediction score grading and online course learning evaluation models:

and (3) based on the second batch (the first batch data is used for constructing an evaluation model) and the physiological index, learning behavior and learning ability data of the students in the virtual simulation course of the microbiological virtual simulation experiment, the online course learning evaluation fusion model is applied and verified in accuracy. And collecting various data of students in the virtual simulation learning platform according to a plurality of key data in the model.

The steps of verifying the feasibility and effectiveness of the online course learning evaluation model are as follows. Firstly, grading learning effects of students: setting a gear 5 below 60 minutes; a score of 60-70 is set as 4; a score of 70-80 is set as 3; a score of 80-90 is set as 2; the score of 90-100 was set to 1. And secondly, predicting a final score result of the student by using a final model obtained by the multi-mode data, and grading according to the grading method. And comparing with the final grading results of the four classes of students, and calculating the accuracy so as to verify the accuracy of the final model. The final results obtained by students are set to be 5,4,3,2 and 1 according to the preset grading intervals (< 60, 60-70, 70-80, 80-90, 90-100), the final results obtained by students are also set to be 5,4,3,2 and 1 according to the grading, the grades of the students are consistent and are marked as 1, the grades of the students are inconsistent and are marked as 0, and the probability marked as 1, namely the model prediction accuracy is calculated, so that the accuracy of the final model is verified. The consistency of the activity variable finally determined by the regression with the determined online course learning effect evaluation index is proved by the above expression, and the consistency of the activity variable finally determined by the regression with the determined online course learning effect evaluation index is proved by the above expression.

4.1 data feedback system design based on WeChat program:

student data acquired based on the mobile device and the questionnaire are matched with student personal information (academic number), and are arranged into a data list; a login interface is designed in a WeChat applet, personal information (academic) of a student is obtained and matched with a data list; and displaying the corresponding achievement forecast in a WeChat applet interface. Designing a data feedback system by using a WeChat applet development tool, realizing interface design by using a WXML program file, and completing the design of components such as pictures, buttons and the like; the WXSS program file is utilized to realize the content design of characters, sizes and the like; and the JS program file is utilized to realize user interaction design and button function. Based on the program design debugging and online, the content pushing is finally embodied on the WeChat interface of the student end and the teacher port by matching with the background multimodal database information and multimodal evaluation model calculation, and the four types of achievements are pushed, the data histogram generates statistics, the learning scheme is pushed, the teaching strategy suggestion and other contents are conveniently interacted and are clear at a glance.

5. Evaluation model optimization based on deep learning:

in the initial stage of the experiment, the number of accumulated students is small, the data size is small, training of a random forest completion model is adopted to predict and evaluate the student performance, and under the condition that the data size is gradually increased, the training of an evaluation model can be carried out by adopting a deep learning method so as to realize more accurate prediction and evaluation of the student performance. According to the invention, a Long Short-Term Memory network model (LSTM) is adopted, a forgetting mechanism and a storage mechanism can be used for more effectively modeling the learning state information of the students recorded in each course, and a more accurate evaluation model can be obtained compared with a random forest method. The LSTM network structure designed by the invention is an LSTM layer after an input layer, the number of hidden layer units can be optimized through experiments, then two full-connection layers are connected, dropout is added between the two full-connection layers to improve trainability, and finally, the prediction of student results is completed through a regression layer.

6. Evaluation results

6.1 Multi-modal model establishment

Effective characteristics of online virtual simulation learning are summarized based on multi-modal data integration, and the effective characteristics are specifically shown in the following table 1 (for privacy protection, name is hidden):

table 1:20 mode on-line virtual simulation experiment learning data collection (sample number 86)

The method comprises the steps of storing student learning trace in a database log through a technical means, screening huge log data, and preliminarily determining an initial data set of online learning activity variables of students. The data features were 20 in total. In order to prove that the 20 indexes are positively correlated with the online learning effect of students, based on data generated by 237 students through virtual simulation experiment learning, a scatter diagram is drawn for verification, and whether the learning ability factors and the learning effect are correlated or not is verified. In order to determine whether the 20 indexes are core indexes for finally evaluating the online course learning effect of the student, binary correlation analysis is performed on the 20 indexes and the online course learning effect of the student. The analysis result shows that the significance values of the initially selected 20 indexes are smaller than 0.05, and the significance values are obviously and positively correlated with the learning effect of the evaluation students. In order to make the learning effect evaluation index of the online virtual simulation experiment course more reliable, a machine learning method is further utilized to analyze 20 index data sets, and the 20 learning behavior indexes are found to influence the learning effect of the virtual simulation experiment to a certain extent, as shown in table 2.

Table 2: coefficient relation of influence of on-line virtual simulation learning core characteristics on learning efficiency

6.2 multimodal model application

Based on the constructed model, the learning effect prediction based on the multi-mode big data is carried out in parallel teaching classes, and the specific result is shown in the following table 3. The result shows that the final score predicted by the intelligent evaluation model based on the deep learning has higher matching degree with the final score of the student examination, and reaches the percentage. According to the invention, various learning behavior data of a student in an online learning process are relied on, related achievements are analyzed based on early biological education informatization research theory research and big data mining, student learning ability information is fused, and a multi-mode information database of the student in the learning process is established; establishing a three-mode (meta-cognition-learning ability-learning behavior) virtual simulation experiment course evaluation system model by using a machine learning method; the learning effect of students is dynamically monitored and evaluated in real time in the subsequent virtual simulation experiment learning process of a large number of students by utilizing the evaluation model, and meanwhile, the evaluation model is continuously optimized by utilizing a machine learning technology, so that the accurate and personalized real-time evaluation, monitoring, early warning and dynamic adjustment of a learning scheme in the whole online learning process are realized.

Table 3: parallel shift virtual simulation experiment score examination score and intelligent prediction comparison result (sample number 109)

/>

The foregoing description of the preferred embodiments of the invention is not intended to be limiting, but rather is intended to cover all modifications, equivalents, and alternatives falling within the spirit and principles of the invention.

Claims

1. The multi-mode-based networked teaching data analysis method is characterized by comprising the following steps of:

thirdly, integrating and analyzing learning ability data, physiological data and learning behavior data generated by learning of students on a theoretical course online learning platform by adopting a method of combining a learning analysis technology and a data mining algorithm, establishing a theoretical online course learning effect evaluation model, evaluating learning effects of the students, and outputting evaluation results in a chart and a digital form by adopting a visualization technology;

the multi-mode-based networked teaching data analysis method adopts the maximum information coefficient MIC to carry out feature screening and remove irrelevant factors, and comprises the following specific steps: first, calculating the correlation coefficient P= (P) of the feature space X and the achievement space S column by column ₁ ，p ₂ ，...，p _n ) The maximum value P of the correlation coefficient _max The corresponding feature is selected as the first feature, say f ₁ ＝X _k The method comprises the steps of carrying out a first treatment on the surface of the Then calculate the feature f ₁ MIC value m= (M) ₁ ，...，m _k-1 ，m _k+1 ，...，m _n ) Let l=0.5×p (i+.k) +0.5×0.5×1-M, let L _max The corresponding feature is selected as the second feature f ₂ The method comprises the steps of carrying out a first treatment on the surface of the And removing the first feature, and repeating the steps until the most sufficient feature quantity is obtained or the maximum value of the current MIC is smaller than a certain threshold value.

2. The multi-modal based networked teaching data analysis method as claimed in claim 1, wherein the multi-modal data feature screening of the multi-modal based networked teaching data analysis method includes:

specific quantitative data are given out by the online learning score of the student theoretical course through online practical operation and online test before the end of each course, physiological signal data are collected and transmitted through an intelligent bracelet, and the collection average value and variance of a signal sequence are taken as characteristic x _bio The method comprises the steps of carrying out a first treatment on the surface of the Learning behavior data (x) _bah ) Completing collection through a theoretical course online learning platform; learning ability data (x) _iq ) Before the first theoretical lesson is started, the first theoretical lesson can be obtained completely in the form of an electronic questionnaire; the feature vector of each student is denoted as X _i ＝(x _bio ，x _bah ，x _iq ) The feature space of all students is denoted as x= (X ₁ ，X ₂ ，...，X _n ) ^T Where n is the number of students, the corresponding achievement space is denoted s= (S ₁ ，S ₂ ，...，S _n ) ^T Firstly, carrying out variance normalization on all the features, and then analyzing the relation between each factor and the theoretical course learning score.

3. The method for analyzing multi-modal based networked teaching data as claimed in claim 1, wherein the establishing of the multi-modal data initial evaluation model of the multi-modal based networked teaching data analysis method includes: the random forest is composed of N decision trees, during training, training samples of each decision tree are obtained by Bootstrap sampling from an original training set, characteristics used during training of each node of the decision tree are also obtained by random sampling from a new characteristic space X, each decision tree carries out recursion splitting according to a judgment criterion, and after the training of the N decision trees is completed, the average value of leaf nodes where each student is located is the final regression result;

4. A multi-modal based networked teaching data analysis method as claimed in claim 3 wherein the single decision tree is implemented by a recursive splitting process as follows: the extracted sample set D forms a root node, and the sample set is split into two parts D1 and D2 according to a judgment criterion; recursively building a left subtree with a sample set D1, and building a right subtree with a sample set D2; setting a condition for stopping splitting, and marking the node as a leaf node and assigning values when the splitting cannot be continued.

5. The multi-modal based networked teaching data analysis method as claimed in claim 3, wherein the specific implementation method of the decision criteria is as follows: the regression error of the root node, i.e. the mean square error of the label values and the regression values of all samples, is calculated as:

random extraction of features X from new feature space X _u Sequencing the extracted training samples D from small to large according to the values of the features; sequentially taking the score of each student as a threshold value, dividing the sample into a left part and a right part, and then calculating the mean square sum error of the left subtree and the right subtree; the error index for a split is defined as the regression error before the split minus the regression error of the left and right subtrees after the split:

E＝E(D)-e(D ₁ )-E(D ₂ )；

6. The multi-modal based networked teaching data analysis method as claimed in claim 1, wherein the multi-modal based networked teaching data analysis method multi-modal evaluation model verification method comprises: firstly, grading learning effects of students: setting a gear 5 below 60 minutes; a score of 60-70 is set as 4; a score of 70-80 is set as 3; a score of 80-90 is set as 2; a score of 90-100 is set as 1; secondly, predicting a final scoring result of the student by using a final model obtained by multi-mode data, and grading according to the grading method; comparing the model with the final grading results of the four classes of students, and calculating the accuracy so as to verify the accuracy of the final model; setting the model prediction achievement as 5,4,3,2 and 1 according to the preset grading intervals <60, 60-70, 70-80, 80-90 and 90-100, setting the final achievement actually obtained by the students as 5,4,3,2 and 1 according to the grading, marking the grade of the student achievement as 1 if the grade is consistent, marking the grade of the student achievement as 0 if the grade is inconsistent, and calculating the probability of marking the grade as 1, namely the model prediction accuracy.

7. The method for analyzing the multi-modal based networked teaching data according to claim 1, wherein the LSTM network structure of the multi-modal evaluation model of the multi-modal based networked teaching data analyzing method is an LSTM layer after an input layer, the number of hidden layer units is optimized through experiments, two fully connected layers are connected, dropout is added between the two fully connected layers to improve trainability, and finally prediction of student performance is completed through a regression layer.

8. A multi-modality based networked teaching data analysis system for implementing the multi-modality based networked teaching data analysis method of any of claims 1 to 7, characterized in that the multi-modality based networked teaching data analysis system comprises:

9. An information data processing terminal applying the multi-mode-based networked teaching data analysis method according to any one of claims 1 to 7.