CN112699933B - Automatic identification method and system for processing capability of user teaching materials - Google Patents

Automatic identification method and system for processing capability of user teaching materials Download PDF

Info

Publication number
CN112699933B
CN112699933B CN202011583583.7A CN202011583583A CN112699933B CN 112699933 B CN112699933 B CN 112699933B CN 202011583583 A CN202011583583 A CN 202011583583A CN 112699933 B CN112699933 B CN 112699933B
Authority
CN
China
Prior art keywords
user
teaching
processing
average
usage
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011583583.7A
Other languages
Chinese (zh)
Other versions
CN112699933A (en
Inventor
吴砥
陈敏
李亚婷
徐建
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Central China Normal University
Original Assignee
Central China Normal University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Central China Normal University filed Critical Central China Normal University
Priority to CN202011583583.7A priority Critical patent/CN112699933B/en
Publication of CN112699933A publication Critical patent/CN112699933A/en
Application granted granted Critical
Publication of CN112699933B publication Critical patent/CN112699933B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/30Computing systems specially adapted for manufacturing

Abstract

The invention discloses a method and a system for automatically identifying processing capacity of a user teaching material based on multi-source data fusion. The method comprises the following steps: s1, predefining attributes of processing capacity of a user teaching material and feature variables contained in each attribute, wherein the attributes comprise richness, diversity, availability, usefulness and timeliness; s2, collecting user data from the teaching platform, and calculating a teaching material processing capacity matrix of each user according to the user data; s3, acquiring sample data of the user set; s4, constructing a regression model based on a plurality of machine learning methods, training the regression model by using sample data, and determining an optimal regression model; s5, dynamically identifying the processing capacity of the user teaching materials by using the trained optimal regression model. The invention can realize the intelligent automatic identification of the processing and processing capacity of the teaching materials of the user.

Description

Automatic identification method and system for processing capability of user teaching materials
Technical Field
The invention belongs to the field of education informatization, and particularly relates to a method and a system for automatically identifying processing capacity of user teaching materials based on multi-source data fusion.
Background
With the development of computer technology, various teaching platforms for assisting teaching are important information carriers in teaching, and the teaching platforms include, but are not limited to, regional education resource public service platforms, online teaching platforms, network research and repair platforms, online training platforms, education management platforms and the like. In teaching based on a teaching platform, recognition of processing and processing capabilities of teaching materials of users such as users is very important.
At present, the processing and processing capacity of the teaching materials of the user is still identified in a questionnaire form, for example, the user performs self-evaluation through a scale or test questions, only the current state of the user is concerned, the investigation process has a certain subjectivity and needs high coordination, meanwhile, the consideration of the processing and processing process data of the objective teaching materials of the user is ignored, and the problems of inaccurate identification, low identification efficiency and low data utilization rate exist. How to use computer technology to realize more objective, more accurate and more continuous intelligent automatic identification based on user data of users on a teaching platform is an important problem. There is no more sophisticated computer-based automated identification technology in the prior art.
Disclosure of Invention
Aiming at least one defect or improvement requirement of the prior art, the invention provides a method and a system for automatically identifying processing capacity of a user teaching material based on multi-source data fusion, which can realize intelligent automatic identification of processing and processing capacity of the user teaching material.
In order to achieve the above object, according to a first aspect of the present invention, there is provided a method for automatically identifying processing capability of a user teaching material based on multi-source data fusion, applied to a teaching platform supporting processing or management of the teaching material, including the steps of:
s1, defining attributes of processing capacity of teaching materials of users in advance, wherein each attribute comprises a characteristic variable;
s2, collecting user data from the teaching platform, carrying out multi-source data fusion according to an analysis method of the user data based on behaviors, contents and social dimensions, and determining values of characteristic variables of attributes of teaching material processing capabilities of each user, wherein a matrix of the teaching material processing capabilities of each user is formed by an array of values of all the characteristic variables of all the attributes of the teaching material processing capabilities of each user;
s3, selecting a user set, acquiring a teaching material processing energy moment array set corresponding to the user set, and further acquiring a capability label set of the user set which is marked manually;
S4, constructing multiple regression models based on multiple machine learning methods, wherein the regression models are used for processing the capability labels capable of being identified by the output of the moment array according to the input teaching materials, training the regression models by utilizing the teaching materials processing capability moment array set and the capability label set, and determining an optimal regression model;
s5, dynamically identifying the processing capacity of the user teaching materials by using the trained optimal regression model.
Preferably, the attribute of the processing capability of the teaching materials comprises richness, diversity, usability, usefulness and timeliness;
the richness is used for representing the quantity distribution characteristics of teaching materials in different file formats;
the diversity is used for representing the application of teaching materials and the distribution characteristics of processing types;
the availability is used for representing the use characteristics of the teaching materials by an uploader of the teaching materials;
the usefulness is used for representing the approval characteristics of other people except the uploader of the teaching materials on the teaching materials;
the timeliness is used for representing fluctuation characteristics of the updating frequency of the teaching materials.
Preferably, the richness comprises 4 feature variables of picture richness, audio richness, video richness and animation richness;
The diversity comprises 3 characteristic variables of usage diversity, processing type diversity and theme diversity;
the availability comprises 5 characteristic variables of average usage amount, maximum usage amount, self-usage total amount, student usage total amount and usage mode;
the usefulness comprises 13 characteristic variables of average sharing quantity, average transmission quantity, transmission rate, average collection quantity, maximum collection quantity, average downloading quantity, maximum downloading quantity, acceptance rate, average grading, used centrality, used category, comment emotion tendency and comment centrality;
the timeliness includes 2 characteristic variables of update frequency and volatility.
Preferably, the user data comprises user basic data, teaching material label data, teaching material use behavior data, teaching material scoring behavior data and teaching material comment behavior data;
the user basic data comprises user id, user name, user role, user gender, user age, region, school type, taught school segment and taught discipline;
the basic data of the teaching materials comprise a teaching material id, a teaching material name, a material form, a material use and a processing type;
The teaching material tag data comprises a teaching material id, a tag name and a tag weight;
the teaching material use behavior data comprise a use behavior id, a user, a use behavior action, a teaching material, a behavior time and a behavior source;
the teaching material scoring behavior data comprises scoring behavior id, a user, teaching materials, scoring scores and behavior time;
the comment behavior data of the teaching materials comprise an evaluation behavior id, a user, teaching materials, comment content and behavior time.
Preferably, the behavior dimension analysis method comprises descriptive statistical analysis and K-means cluster analysis, and is mainly used for calculating picture richness, audio richness, video richness, animation richness, usage diversity, processing type diversity, average usage, maximum usage, self-usage, student usage, usage pattern, average sharing, average propagation, propagation rate, average collection, maximum collection, average download, maximum download, acceptance rate, average score, used category, update frequency and fluctuation characteristic variable;
the content-based dimension analysis method comprises multidimensional dimension analysis and emotion tendency analysis, and is mainly used for calculating theme diversity and comment emotion tendency characteristic variables;
Social dimension analysis-based methods include social network analysis, primarily for computation of feature variables that are used centrality and comment centrality.
Preferably, the step S3 includes the steps of:
s31, selecting the user set according to the region where the user is located, the type of school, the taught school and the dimension of the taught department, and marking the user set as U_teacher, marking the number of the user set as NU, and acquiring a corresponding teaching material processing capacity matrix X of each user in the user set i Form a corresponding teaching material processing energy moment array set, which is marked as X, X= (X) 1 ,X 2 ,...,X i ,...,X NU ) T Wherein X is i ∈U_teacher;
S32, acquiring a capability label set of a user set U_teacher marked by people, and marking the capability label set as Y u_teacher ,Y u_teacher =(Y 1 ,Y 2 ,...,Y i ,...,Y NU ) T Wherein Y is i Capability labels for each user, Y i ∈U_teacher;
Capability label Y i Is determined according to the self-labeling data of the user and expert labeling data, firstly, the self-labeling data St of the user is calculated i And the first expert annotation data Se i Error value e of (2) i =|St i -Se i I, if e i Less than a set critical value E, capability label Y i Determined by the average value of the two, if e i If the value is larger than the set critical value E, acquiring second expert annotation data Sa i Respectively calculate Sa i To St i 、Se i Capability tag Y i By Sa i And determining the average value of the score with smaller distance, wherein the calculation formula is as follows:
Figure BDA0002864911810000041
Preferably, the multiple regression models comprise a multiple linear regression model, a random forest regression model, a support vector machine regression model and a BP neural network regression model;
the multiple linear regression model is to fit a linear regression model by minimizing the sum of squares of residuals between the value labels of the sample users and the predicted values of the linear model, and the value labels are calculated by the following formula:
Figure BDA0002864911810000042
wherein Y is a value tag, C is a constant, R_picture is a picture richness variable, R_audio is an audio richness feature variable, R_video is a video richness feature variable, R_animation is an animation richness feature variable, D_use is a usage diversity feature variable, D_process is a processing type diversity feature variable, D_touch is a theme diversity feature variable, U_average is an average usage feature variable, U_max is a maximum usage feature variable, U_self is an independent usage total feature variable, U_student is a total usage feature variable, U_pattern is a usage Pattern feature variable, Q_share is an average sharing feature variable, Q_direct is an average propagation feature variable, Q_direct is a propagation rate feature variable, Q_colour is an average storage feature variable, Q_mclec is a maximum usage diversity feature variable, Q_download is an average usage feature variable, U_max is a maximum usage feature variable, U_self is an independent usage total feature variable, U_student is a student is a total usage feature variable, U_Pattern is a usage Pattern feature variable, Q_share is an average sharing feature variable, Q_direct is an average propagation feature variable, Q_direct is a maximum feature variable, Q_capture feature variable is a Q_capture feature variable, Q_direct is a maximum feature variable, Q_capture feature is a score, Q_score is a feature variable, Q_score is a score feature variable, Q_update is a feature variable, and Q_score is a feature variable,
Figure BDA0002864911810000051
And omega 1 ~ω 26 The epsilon is an error for the weight coefficient obtained by training;
the random forest regression model is an algorithm model which uses CART decision trees as weak learners and randomly selects features, T weak learners are independently trained through T times of acquisition, and a final result calculates regression results of the T weak learners by adopting a weighted average method;
the support vector machine regression model maps an input teaching material processing energy moment array into a high-dimensional feature space through a kernel function to realize regression calculation of a value tag, and a calculation formula of the value tag is as follows:
Figure BDA0002864911810000052
wherein Y is a value tag that is used to identify the value,
Figure BDA0002864911810000053
and alpha i For Lagrange coefficient, x is the characteristic variable of the processing attribute of the input user teaching material,/for the processing attribute>
Figure BDA0002864911810000054
As characteristic variable x i Is of transposed form->
Figure BDA0002864911810000055
Is a kernel function, satisfy->
Figure BDA0002864911810000056
b is a constant;
the BP neural network regression model is a three-layer neural network with an input layer, a hidden layer and an output layer, each layer is composed of a plurality of neurons, wherein the input layer is 27 in total, the number of the hidden layers is 9, the number of the output layer is 1 value label, and the regression of the value label is realized through the full connection of the neurons.
Preferably, the step S4 includes the steps of:
dividing sample data formed by the teaching material processing energy moment array set and the energy label set into k groups, extracting 1 group of teachers from the sample data as a test set each time, taking the remaining k-1 groups of teachers as a training set, and gradually training the regression model through k times;
the trained evaluation effect value is the regressionThe average absolute percentage error value of the model, denoted MAPE, is calculated by:
Figure BDA0002864911810000061
wherein M is the number of users corresponding to the test set sample, y' j Predicted value y of ability label for teacher j j The true value of the teacher j ability label;
and comparing the evaluation effects of different regression models, and determining the regression model with the minimum MAPE value as the optimal regression model.
Preferably, the method further comprises step S6:
collecting user update data at time t;
and dynamically updating the capability label of the user based on the user updating data and the trained optimal regression model.
According to a second aspect of the present invention, there is provided an automatic recognition system for processing capability of a user teaching material based on multi-source data fusion, applied to a teaching platform supporting processing or management of the teaching material, comprising:
the predefining module is used for predefining attributes of processing capacity of the user teaching materials and feature variables contained in each attribute;
The data acquisition module is used for acquiring user data from the teaching platform, carrying out multi-source data fusion according to the user data from behaviors, contents and social dimensions to determine the values of variables of the attribute of the teaching material processing capability of each user, and forming a teaching material processing capability matrix of each user by a tuple composed of the values of all characteristic variables of all the attribute of the teaching material processing capability of each user;
the system comprises a sample acquisition module, a processing module and a processing module, wherein the sample acquisition module is used for setting screening conditions to select a user set, acquiring a teaching material processing energy moment array set corresponding to the user set, and further acquiring a capability label set of the user set which is marked manually;
the training module is used for constructing a plurality of regression models, the regression models are used for outputting and identifying capacity labels according to the input teaching material processing capacity moment array, and training the regression models by utilizing the teaching material processing capacity moment array set and the capacity label set to determine an optimal regression model;
and the identification module is used for dynamically identifying the processing capacity of the user teaching materials by utilizing the trained optimal regression model.
In summary, the invention has the advantages and positive effects that:
(1) The multi-dimensional procedural data in the education platform can be fully utilized, the intelligent automatic identification of the processing and processing capacity of the user teaching materials is realized, the characteristics of objectivity, accuracy and duration are realized, more time is spent only when the model is trained in advance, and the characteristics of high speed and high efficiency are realized when the trained model is applied to identification.
(2) In addition, a time dimension is introduced, an automatic dynamic updating and identifying mode is supported, and large-scale and continuous evaluation work such as processing and processing capacity of teaching materials of users, literacy of teacher information and the like is facilitated.
(3) The accuracy of intelligent automatic identification can be further improved by optimizing the capability attribute/feature variable and the collected data type.
Drawings
Fig. 1 is a general flowchart of a method for dynamically evaluating processing and processing capabilities of user teaching materials based on multi-source data fusion according to an embodiment of the present invention.
Detailed Description
The present invention will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention. In addition, the technical features of the embodiments of the present invention described below may be combined with each other as long as they do not collide with each other.
Fig. 1 shows a general flowchart of a method for automatically identifying processing capability of a user teaching material based on multi-source data fusion, which is applied to a teaching platform supporting processing or management of the teaching material, and includes the following steps:
s1, defining attributes of processing capacity of the teaching materials of the user and feature variables contained in each attribute in advance.
The attribute of the processing capability of the teaching materials comprises richness, diversity, availability, usefulness and timeliness;
the richness is used for representing the quantity distribution characteristics of teaching materials in different file formats;
the diversity is used for representing the application of teaching materials and the distribution characteristics of processing types;
the availability is used for representing the use characteristics of the teaching materials by an uploader of the teaching materials;
the usefulness is used for representing the approval characteristics of other people except the uploader of the teaching materials on the teaching materials;
the timeliness is used for representing fluctuation characteristics of the updating frequency of the teaching materials;
the richness comprises 4 characteristic variables of picture richness R_picture, audio richness R_audio, video richness R_video and animation richness R_animation;
The picture richness R_picture refers to a log function standardized value of the number N_picture of the picture teaching materials uploaded by a user, and a picture richness calculation formula for any user i is as follows: r_picture i =log 10 (N_picture i );
The audio richness R_audio refers to a log function standardized value of the number N_audio of audio teaching materials uploaded by a user, and the audio richness calculation formula for any user i is as follows: r_audio i =log 10 (N_audio i );
The video richness R_video refers to a log function standardized value of the number N_video of video teaching materials uploaded by a user, and the video richness calculation formula for any user i is as follows: r_video i =log 10 (N_video i );
The animation richness R_animation refers to the transmission painting teaching on the userThe log function standardized value of the number of the learning materials N_animation is calculated according to the following formula for the animation richness of any user i: r_animation i =log 10 (N_animation i );
The diversity comprises a purpose diversity D_use, a processing type diversity D_process and a theme diversity D_topic3 characteristic variables;
the usage diversity D_use is the proportion of the usage number N_use of the user uploaded teaching materials to the total number Num_of_uses of the teaching materials, and the usage diversity calculation formula of the teaching materials for any user i is as follows:
Figure BDA0002864911810000081
The processing type diversity D_process refers to the proportion of the number N_process of processing types of the user uploaded teaching materials to the total number Num_of_process of processing forms of the teaching materials, and the processing type diversity calculation formula of the teaching materials for any user i is as follows:
Figure BDA0002864911810000091
the topic diversity refers to the proportion of the number N_topic of topics of the user uploaded teaching materials to the total number Num_of_topic of the teaching materials, and the topic diversity calculation formula of the teaching materials compared with any user i is as follows:
Figure BDA0002864911810000092
the availability comprises 5 characteristic variables of average usage U_average, maximum usage U_max, total self-usage U_self, total student usage U_student and usage mode U_pattern;
the average usage U_average refers to the ratio of the sum of the usage U_each of the teaching materials uploaded by the user to the total number N_all of the teaching materials uploaded by the user, and the calculation formula of the average usage of the teaching materials for any user i is as follows:
Figure BDA0002864911810000093
the maximum usage U_max refers to a log function standardized value of the maximum value of the usage U_each of each teaching material uploaded by a user, and a calculation formula of the maximum usage of the teaching material for any user i is as follows:
Figure BDA0002864911810000094
the self-use total amount U_teacher refers to a log function standardized value of the sum of the self-use amount U_teacher of each teaching material uploaded by a user, and the self-use total amount calculation formula for the teaching material of any user i is as follows:
Figure BDA0002864911810000095
The total student use amount U_student refers to a log function standardized value of the sum of the student use amounts U_secret of each teaching material uploaded by a user, and the total student use amount calculation formula for any teaching material of user i is as follows:
Figure BDA0002864911810000096
the use mode U_pattern is a result of clustering the use modes of all users of the teaching platform for uploading teaching materials based on k-means;
the usefulness comprises 13 characteristic variables of average sharing quantity Q_share, average propagation quantity Q_diffuse, propagation rate Q_diffuse_rate, average collection quantity Q_collection, maximum collection quantity Q_mcollect, average download quantity Q_download, maximum download quantity Q_mdown, acceptance rate Q_acceptance, average score Q_score, used centrality Q_udenree, used category Q_utype, comment emotion tendency Q_emotion and comment centrality Q_cdegrey;
the average sharing quantity Q_share refers to the ratio of the sum of the shared quantity Q_share_each of all teaching materials uploaded by a user to the total number N_all of the teaching materials uploaded by the user, and the calculation formula of the average sharing quantity of the teaching materials of any user i is as follows:
Figure BDA0002864911810000101
the average propagation quantity Q_share refers to the ratio of the sum of the browsed quantity Q_diffuse_each of each teaching material uploaded by the user through a sharing link to the total number N_all of the teaching materials uploaded by the user, and the calculation formula of the average propagation quantity of the teaching materials for any user i is as follows:
Figure BDA0002864911810000102
The propagation rate Q_diffuse_rate refers to the ratio of the average propagation quantity Q_diffuse of the user uploading education materials to the average sharing quantity Q_share of the user uploading education materials, and the calculation formula of the propagation rate of the education materials for any user i is as follows:
Figure BDA0002864911810000103
the average collection Q_collection refers to the ratio of the sum of the collection Q_collection_each of all teaching materials uploaded by a user to the total number N_all of the teaching materials uploaded by the user, and the calculation formula of the average collection of the teaching materials of any user i is as follows:
Figure BDA0002864911810000104
the maximum collection Q_mcollet refers to a log function standardized value of the maximum value of the collection Q_collect_each of each teaching material uploaded by a user, and the maximum collection calculation formula for the teaching material of any user i is as follows:
Figure BDA0002864911810000105
the average download amount q_download refers to the ratio of the sum of the download amounts q_download_each of each teaching material uploaded by the user to the total number n_all of the teaching materials uploaded by the user, and the calculation formula of the average download amount of the teaching materials for any user i is as follows:
Figure BDA0002864911810000106
the maximum download amount q_mdown refers to a log function standardized value of a maximum value of each teaching material of the user uploaded teaching materials by the download amount q_download_each, and a calculation formula of the maximum download amount of the teaching materials for any user i is as follows:
Figure BDA0002864911810000111
The acceptance rate q_acceptance refers to the ratio of the sum of the collected q_collection_each and the downloaded q_download_each of each teaching material of the user uploading teaching material to the browsed q_browse_each of each teaching material, and the calculation formula of the acceptance rate of the teaching material for any user i is as follows:
Figure BDA0002864911810000112
the average score Q_score refers to the ratio of the sum of the scores Q_score_each of all the materials uploaded by the user to the total number N_all of the materials uploaded by the user, and the calculation formula of the average score of the teaching materials of any user i is as follows:
Figure BDA0002864911810000113
the used centrality Q_uderre refers to the ratio of the total number U_use of users uploading all teaching materials used by others to the number U of users minus one, and the used centrality calculation formula for any user i is as follows:
Figure BDA0002864911810000114
the used category Q_utype is a result of clustering the used modes of all users of the teaching platform for uploading teaching materials based on k-means;
the comment emotion tendency Q_emotion refers to the ratio of the sum of forward emotion comments Q_emotion_each of all materials uploaded by a user to the total number N_all of the materials uploaded by the user, and the average score calculation formula of the teaching materials of any user i is as follows:
Figure BDA0002864911810000115
The comment centrality Q_cdegree is the ratio of the total number U_comment of all teaching materials uploaded by the user and reviewed by others to the number U of users, and the comment centrality calculation formula for any user i is as follows:
Figure BDA0002864911810000116
the equalization comprises updating frequency T_fre and fluctuation T_vol2 characteristic variables;
the update frequency t_fre refers to the average number of times that the user uploads the teaching material n_time in each time period T within the time period T, and the calculation formula of the update frequency of the teaching material of any user i is as follows:
Figure BDA0002864911810000121
the fluctuation T_vol refers to the ratio of the user to the time period T of uploading the teaching material N_time in each time period T and the reference percentage B of the teaching material in the time period T t The sum of squares of the differences of the teaching material volatility calculation formula for any user i is:
Figure BDA0002864911810000122
in the embodiment of the invention, T=12 is selected, and the provided reference percentile of the teaching materials in 12 months is as follows: b= {8%,10%,8%,8%,8%, 10%,8%,8%,8%,8% };
s2, collecting user data from the teaching platform, carrying out multi-source data fusion according to an analysis method of the user data based on behaviors, contents and social dimensions to determine the values of characteristic variables of the attribute of the teaching material processing capability of each user, and forming a teaching material processing capability matrix of each user by a tuple composed of the values of all the characteristic variables of all the attribute of the teaching material processing capability of each user.
The behavior dimension analysis method comprises descriptive statistical analysis and K-means cluster analysis, and is mainly used for calculating picture richness, audio richness, video richness, animation richness, usage diversity, processing type diversity, average usage, maximum usage, self-usage total, student usage total, usage mode, average sharing quantity, average propagation quantity, propagation rate, average collection quantity, maximum collection quantity, average downloading quantity, maximum downloading quantity, acceptance rate, average score, used category, updating frequency and fluctuation characteristic variable.
The content-based dimension analysis method comprises multidimensional dimension analysis and emotion tendency analysis, and is mainly used for calculating theme diversity and comment emotion tendency characteristic variables;
social dimension analysis-based methods include social network analysis, primarily for computation of feature variables that are used centrality and comment centrality. The teaching platform comprises, but is not limited to, an education application support platform such as an area education resource public service platform, an online teaching platform, a network study and repair platform, an online training platform, an education management platform and the like; the embodiment of the invention adopts a Z province educational resource public service platform network learning space, and the time for data acquisition is 2019, 08 and 30;
The user data comprises user basic data, teaching material label data, teaching material use behavior data, teaching material scoring behavior data and teaching material comment behavior data;
the user basic data comprise user id, user name, user role, user gender, user age, location area, school type, taught section and taught department, and can be expressed by U= (U_id, U_name, U_type, U_gene, U_age, U_area, U_school, U_section and U_subject);
the value range of the user role U_type comprises teaching teachers, students and others, and can be expressed as follows: u_type= { u_teacher, u_student, u_other };
the value range of the user gender U_gener is {0,1}, wherein 0 represents female and 1 represents male;
the value range of the school category U_school comprises cities, counties and villages, and can be expressed as follows: u_school= { u_city, u_town, u_count };
the value range of the taught segment U_section comprises elementary school, junior middle school, high middle school and none, and can be expressed as u_section= { u_primary, u_junior, u_high,0}, wherein the taught segment of students and other roles can only take value 0;
The value range of the taught subject U_subject includes Chinese, mathematics, english, physics, chemistry, biology, history, politics, geography, society, science, sports, music, art, health, law, information technology, comprehensive practice and nothing, which can be expressed as u_subject= { u_Chinese, u_maths, u_England, u_physics, u_chemistry, u_biology, u_history, u_politics, u_geometry, u_society, u_science, u_sports, u_music, u_parts, u_health, u_regal, u_ information technology, u_ comprehensive practice,0}, wherein the taught subjects of students and other roles can only take values of 0;
table 1 is a partial example of a user basic data acquisition result of a teacher in a user role provided in the embodiment of the present invention, wherein the user role is 10625 users in total;
table 1 user role is an example of user basic data acquisition results (part) for a teacher
Figure BDA0002864911810000131
Figure BDA0002864911810000141
Wherein, teacher 1 is a Li teacher, which is a 34-year-old female mathematics teacher for primary school in certain city in S city of Z province;
teacher 2 is a teacher, which is a 33-year-old male biological teacher in S city of Z province;
teacher 10625 is a male scientific teacher of 45 years old in city primary school in the city of D, Z, zhao teacher;
The basic data of the teaching materials comprises teaching material ids, teaching material names, material forms, material purposes and processing types, and can be represented by M= (M_id, M_name, M_format, M_use and M_type);
the value range of the teaching material form M_format comprises pictures, audio, video and animation, and can be expressed as follows: m_format= { m_picture, m_audio, m_video, m_animation };
the value range of the teaching material application M_use comprises pre-class use, in-class use and after-class review use, and can be expressed as follows: m_use= { m_before, m_in, m_after }, wherein the total number of teaching material uses num_of_use=3;
the value range of the teaching material processing type M_comprises conversion, beautification, section selection and integration, and can be expressed as follows: m_type= { m_cover, m_emmbelish, m_exclerpt, m_integration }, where the total number of teaching material processing types num_of_process = 4;
table 2 is a partial example of the basic data acquisition result of the teaching materials provided by the embodiment of the invention, wherein the total number of the teaching materials is 95348;
table 2 teaching materials basic data acquisition results (partial) example
M_id M_name M_format M_use M_type
1 Small × m_picture m_before m_convert
2 I' x m_picture m_in m_integration
... ... ... ... ...
95348 Class x m_video m_in m_excerpt
The teaching material with M_id of 1 is a picture teaching material used by pre-learning before a class after conversion processing;
The teaching material with M_id of 2 is a picture teaching material used in a classroom after integration processing;
the teaching material with M_id of 95348 is a video teaching material used in a classroom after beautification processing;
the teaching material tag data comprises a teaching material id, a tag name and a tag weight, and can be represented by L= (M_id, L_name and L_weight);
the label weight L_weight represents the number of times of occurrence of the label, and the value range is [0, + ]; table 2 teaching materials tag data acquisition results (partial) example
Table 3 is a partial example of the acquisition result of the teaching material tag data provided by the embodiment of the present invention, wherein the total number of the teaching tags is 543325;
table 3 partial examples of teaching material tag data acquisition results
M_id L_name L_weight
1 Geometry of 5
2 Round shape 2
... ... ...
95348 Mathematics 10
Wherein, the teaching material with M_id of 1 is marked as geometric 5 times and circular 2 times;
the teaching material with m_id 95348 is marked as math 10 times;
the user uploads, browses, collects, downloads, uses, shares the procedural use behavior data such as teaching materials, including use behavior id, user id, use behavior action, teaching materials, behavior Time, behavior source, can be represented by B= (B_id, U, B_action, M, time, B_source);
The value range of the usage behavior action B_action includes uploading, browsing, collecting, downloading, using and sharing, and can be expressed as follows: b_action= { b_upload, b_browse, b_collect, b_download, b_use, b_share };
the range of the behavior source B_source includes searching, sharing and others, and can be expressed as follows: b_source= { b_shared, b_other };
table 4 is a partial example of a user uploading, browsing, collecting, downloading, using, sharing teaching materials and other procedural usage behavior data acquisition results provided by the embodiment of the present invention, where the usage behavior data is 406576;
table 4 teaching materials use behavior data acquisition results (partial) example
Figure BDA0002864911810000161
The use behavior of the B_id of 1 is that a user with the U_id of 1 browses teaching materials with the M_id of 198 in a mode of searching 7 minutes and 3 seconds at 7 points of 9 months 1 day in 2018;
the use behavior of B_id 2 is that a user with U_id 1 uses teaching materials with M_id 198 in 2018, 9, 1, 7, 8 minutes and 21 seconds;
the use behavior of the B_id 406576 is that a user with the U_id 269 browses teaching materials with the M_id 1376 in a sharing mode when the point of 23 minutes and 13 seconds in 2019 8, 30 and 23;
the teaching material scoring behavior data comprises scoring behavior id, a user, teaching materials, scoring scores and behavior Time, and can be represented by S= (S_id, U, M, S_score, time);
The value range of the grading index is 0, 5;
table 5 is a partial example of a scoring behavior data acquisition result of a user on a teaching material, where the scoring behavior data is 107613 scores;
table 5 teaching material scoring behavioral data acquisition results (partial) example
Figure BDA0002864911810000162
Figure BDA0002864911810000171
Wherein, the scoring behavior of s_id 1 is that the user with u_id 1 scores teaching materials with m_id 18 for 2 in 22 minutes and 20 seconds at 21 of 9/3/2018;
the scoring behavior of s_id 2 is that a user with u_id 1 scores education material with m_id 1958 for 5 at 7 minutes 23 seconds at 2018, 9, 18, 14;
the scoring behavior of s_id 107613 is that a user with u_id 2239 scores teaching material with m_id 18723 for 4.5 at 2019, 8, 23, 58 minutes and 14 seconds;
the teaching material comment behavior data comprises evaluation behavior id, users, teaching materials, comment content and behavior Time, and can be represented by a representation C= (C_id, U, M, C_comment, time);
table 6 is a partial example of a comment behavior data collection result of a user on a teaching material, where the comment behavior data is 252123 pieces;
table 6 comment on Material behavior data acquisition results (partial) example
C_id U M C_comment Time
1 U_id=1765 M_id=1 Is very helpful in 2018-09-01 14:12:25
2 U_id=8872 M_id=1 It is not clear 2018-09-02 12:17:03
... ... ... ... ...
252123 U_id=22 M_id=91121 Support for 2019-07-31 14:38:04
Wherein, the comment behavior with C_id of 1 is that the user with U_id of 1765 is helpful to comment on the teaching material with M_id of 1 in 2018, 9, 1, 14 days, 12 minutes and 20 seconds;
the comment behavior of C_id 2 is that the comment of the user with U_id 8872 on the teaching material with M_id 1 is not clear at 17 minutes and 3 seconds at 2018, 9, 2 and 12;
the comment behavior with C_id of 252123 is that a user with U_id of 22 comments on teaching materials with M_id of 91121 in 2019, 7, 31, 14 minutes and 4 seconds;
the intermediate variables related to the values of the characteristic variables of the attribute of the processing capability of the teaching materials of each user comprise the number of topics Num_of_topic of the teaching platform, a teaching resource use mode U_pattern, a teaching resource used type Q_utype, and the total number N_all of the teaching materials uploaded by the user i i Number of picture teaching materials n_picture i Number of audio teaching materials N_Audio i Number of video teaching materials N_video i Animation teaching material quantity N_animation i Number of uses of teaching material N_use i Number of processing types N_Process of teaching materials i Number of topics of teaching materials N_topic i Total number of users u_use of teaching material used by others i Total number of users, u_comment, of teaching material being reviewed i User i uploads usage U_each of teaching material n i,n Is used by oneself with U_test i,n U_secret used by students i,n Shared quantity Q_share_each i,n Browsed quantity Q_diffuse_each by sharing links i,n Browsed quantity q_browse_each i,n Collected quantity q_collection_each i,n Downloaded amount q_download_each i,n Score Q_score_each i,n Forward emotion comment Q_emotion_each i,n User i uploads the teaching material N_time in each time period T within the time period T i,t
The topic number num_of_topic of the teaching platform is obtained through multidimensional scale analysis of a teaching material label network, and the value is 20 in the embodiment of the invention;
the teaching material label network is an undirected network and can be represented by Gl= (L, el), wherein L represents all labels, and El represents a collinear relation among the labels;
the teaching resource use mode U_pattern is a result of K-means clustering based on average use amount, maximum use amount, total sub use amount and total student use amount of a user, wherein the selected K value is 4;
the teaching resource used type Q_utype is a result of K-means clustering based on average sharing amount, average transmission amount, average collection amount, maximum collection amount, average downloading amount and maximum downloading amount of users, wherein the selected K value is 4;
The user i uploads the total number N_all of the teaching materials i The calculation formula of (2) is as follows: n_all i =|{B|B_action=b_upload,U=i}|;
The user i uploads the number N_picture of the picture teaching materials i The calculation formula of (2) is as follows: n_picture i =|{B|B_action=b_upload,U=i,M_format=m_picture}|;
The user i uploads the number N_audio of the audio teaching materials i The calculation formula of (2) is as follows: n_audio i =|{B|B_action=b_upload,U=i,M_format=m_audio}|;
The user i uploads the number N_video of the video teaching materials i The calculation formula of (2) is as follows: n_video i =|{B|B_action=b_upload,U=i,M_format=m_video}|;
The number N_animation of the transmission picture teaching materials on the user i i The calculation formula of (2) is as follows: n_animation i =|{B|B_action=b_upload,U=i,M_format=m_animation}|;
The number of uses N_use of the user i for uploading the teaching materials i The calculation formula of (2) is as follows: n_use i =|{M_use|B_action=b_upload,U=i}|;
The number of processing types N_process of the user i uploading the teaching materials i The calculation formula of (2) is as follows: N_Process i =|{M_process|B_action=b_upload,U=i}|;
The user i uploads the topic number N_topic of the teaching materials i Is determined according to the number of tag topics belonging to the determined platform topics;
the user i uploads the total number of users U_use of the teaching materials used by others i The method is obtained by using the relative centrality of the network for teaching platform users;
the user uses a directed network, which can be represented by gu= (U, eu), wherein U represents all users, eu represents that user i uses teaching resources of user j;
The user i uploads the total number U_comment of the users with the commented on the education materials i The method is obtained by commenting the relative centrality of the network to the teaching platform user;
the user comment network is a directed network and can be represented by Gc= (U, ec), wherein U represents all users, ec represents teaching resources of user j comment of user i;
the user i uploads the usage U_each of the teaching material n i,n The calculation formula of (2) is as follows: u_each i,n = |{ b|b_action=b_use, m=n } |, where n= { m|b_action=b_upload, u=i };
the user i uploads the self-used amount U_teach of the teaching material n i,n The calculation formula of (2) is as follows: u_teach i,n = |{ b|b_action=b_use, m=n, u=i } |, where n= { m|b_action=b_upload, u=i };
the user i uploads the student usage U_secret of the education material n i,n The calculation formula of (2) is as follows: u_sea i,n = |{ b|b_action=b_use, m=n, u_type=u_student } |, where n= { m|b_action=b_upload, u=i };
the user i uploads the shared quantity Q_share_each of the learning material n i,n The calculation formula of (2) is as follows: q_share_each i,n = |{ b|b_action=b_share, m=n } |, where n= { m|b_action=b_upload, u=i };
the user i uploads the browsed quantity Q_diffuse_each of the learning material n through the shared link i,n The calculation formula of (2) is as follows: q_diffuse_each i,n = |{ b|b_action=b_use, b_source=b_shared, m=n } |, where n= { m|b_action=b_upload, u=i };
the user i uploads the browsed quantity Q_browse_each of the learning material n i,n The calculation formula of (2) is as follows: q_browse_each i,n = |{ b|b_action=b_browse, m=n } |, where n= { m|b_action=b_upload, u=i };
the user i uploadsCollectable quantity Q_collection_each of teaching material n i,n The calculation formula of (2) is as follows: q_collect_each i,n = |{ b|b_action=b_collection, m=n } |, where n= { m|b_action=b_upload, u=i };
the downloaded quantity Q_download_each of the user i uploading the learning material n i,n The calculation formula of (2) is as follows: q_download_each i,n = |{ b|b_action=b_download, m=n } |, where n= { m|b_action=b_upload, u=i };
the user i uploads the score Q_score_each of the learning material n i,n The calculation formula of (2) is as follows: q_score_each i,n = { b_weight|b_action=b_score, m=n }, where n= { m|b_action=b_upload, u=i };
the user i uploads comment emotion tendencies Q_project_each of the education material n i,n According to emotion tendency analysis in natural language processing, when comment emotion tendency analysis is positive, representing that the comment emotion belongs to forward emotion and is counted as 1;
The user i uploads the teaching material N_time in each time period T within the time period T i,t The calculation formula of (2) is as follows: n_time i,t =|{B|B_action=b_upload,B_time∈t,U=i}|;
Table 7 and table 8 are examples of the processing and processing capability attribute and the value part of the attribute feature variable of the user teaching material provided in the embodiment of the present invention, where table 7 is the overall value of the user uploaded teaching material, and table 8 is the specific value of each teaching material uploaded by the user;
TABLE 7 user upload of Whole valued (partial) example of educational material
Figure BDA0002864911810000201
Figure BDA0002864911810000211
For the user 1, the total number of the uploaded teaching materials is 139, the number of the picture teaching materials is 110, the number of the audio teaching materials is 4, the number of the video teaching materials is 16, the animation teaching materials are not provided, the number of the purposes of the teaching materials is 2, the number of the processing types of the teaching materials is 2, the number of the topics of the teaching materials is 1, the teaching materials are used by 20 other users and reviewed by 12 users, and the uploaded teaching materials are 10,46,0,0,1,0,14,42,22,4,0,0 in each month within 12 months;
table 8 specific value (partial) examples of each teaching material uploaded by the user
Figure BDA0002864911810000212
Figure BDA0002864911810000221
/>
For the user 1, the usage amount of the uploaded education material 1 is 3, the self usage amount is 3, the student usage amount is 0, the shared amount is 3, the browsed amount is not recorded through the shared link, the browsed amount is not recorded, the collection amount is 1, the downloaded amount is 30, the score is 4, and the number of forward emotion comments is 5; the usage amount of the uploaded teaching material 139 is 100, the self usage amount is 2, the student usage amount is 80, the shared amount is 1, the browsed amount is not recorded through the shared link, the browsed amount is not recorded, the collection amount is 0, the downloaded amount is 63, the score is 4, and the number of forward emotion comments is 2;
The teaching material processing capacity matrix X for forming each user i i I.e. X i =(R_picture i ,R_audio i ,R_video i ,D_use i ,D_process i ,D_topic i ,U_average i ,U_max i ,U_self i ,U_student i ,U_parrten i ,Q_share i ,Q_collect i ,Q_mcollect i ,Q_download i ,Q_mdownload i ,Q_score i ,Q_udegree i ,Q_utype i ,Q_emotion i ,Q_cdegree i ,T_fre i ,T_vol i };
Taking teacher 1 as an example, a teacher 1 teaching material processing and processing capacity evaluation matrix X is described 1 Is a value of (2);
the picture richness of the teacher 1 is as follows: r_picture 1 =log 10 (N_picture 1 )=log 10 (119)=2.08;
The audio richness of teacher 1 takes the value: r_audio 1 =log 10 (N_audio 1 )=log 10 (4)=0.60;
The video richness of teacher 1 takes the values: r_video 1 =log 10 (N_video 1 )=log 10 (16)=1.20;
The purpose diversity value of the teacher 1 is as follows:
Figure BDA0002864911810000231
the processing type diversity value of the teacher 1 is as follows:
Figure BDA0002864911810000232
the theme diversity of teacher 1 takes the value:
Figure BDA0002864911810000233
the average usage value of teacher 1 is:
Figure BDA0002864911810000234
the maximum usage of teacher 1 takes the value of:
Figure BDA0002864911810000235
the total self-use amount of teacher 1 takes the value:
Figure BDA0002864911810000236
the total amount of student usage of teacher 1 is:
Figure BDA0002864911810000237
the use mode of the teacher 1 belongs to the first class according to the clustering result;
the average sharing value of the teacher 1 is:
Figure BDA0002864911810000238
the average collection value of the teacher 1 is:
Figure BDA0002864911810000239
the maximum collection value of the teacher 1 is:
Figure BDA00028649118100002310
the average download value of the teacher 1 is:
Figure BDA0002864911810000241
the maximum download amount of the teacher 1 takes the value of:
Figure BDA0002864911810000242
/>
the average score value of teacher 1 is:
Figure BDA0002864911810000243
the value of the center of the teacher 1 to be used is 0.035;
the used category of the teacher 1 belongs to a third category according to the clustering result;
the emotional tendency of the article theory of the teacher 1 is as follows:
Figure BDA0002864911810000244
the comment centrality value of the teacher 1 is 0.003;
The update frequency of teacher 1 takes the value:
Figure BDA0002864911810000245
the volatility of teacher 1 takes the value:
Figure BDA0002864911810000246
in the embodiment of the invention, the multi-dimensional input matrix X of the teacher 1 1 =(2.08,0.60,1.20,0.67,0.50,0.05,1.93,2.00,1.30,2.33,1,0.07,0.06,0.00,8.94,1.80,3.98,0.035,3,0.51,0.003,11.58,0.14);
Step S3, selecting a user set, acquiring a teaching material processing energy moment array set corresponding to the user set, and further acquiring a capability label set of the user set with manual labeling, wherein the method specifically comprises the steps.
Step S31, selecting the user set according to the region where the user is located, the type of school, the taught school and the dimension of the taught department, and marking the user set as U_teacher, marking the number of the user set as NU, and obtaining a corresponding teaching material processing capacity matrix X of each user in the user set i Form a corresponding teaching material processing energy moment array set, which is marked as X, X= (X) 1 ,X 2 ,...,X i ,...,X NU ) T Wherein X is i ∈U_teacher;
The specific user set u_teacher constructed in the embodiment of the present invention is the administrative users in all regions, all school types and all school segments in Z provinces, and the number nu=5023;
in the embodiment of the invention, the teaching material processing and processing capacity of the user set U_teacher evaluates the input data set X u_teacher Matrix X for 5023 administrative users in all regions, all school types and all school segments of Z provinces i Is a comprehensive matrix of (a), i.e
Figure BDA0002864911810000251
Step S32, obtaining a capability label set of a user set U_teacher marked by people and marking the capability label set as Y u_teacher ,Y u_teacher =(Y 1 ,Y 2 ,...,Y i ,...,Y NU ) T Wherein Y is i Capability labels for each user, Y i ∈U_teacher;
Capability label Y i Is determined according to the self-labeling data of the user and expert labeling data, firstly, the self-labeling data St of the user is calculated i And the first expert annotation data Se i Error value e of (2) i =|St i -Se i I, if e i Less than a set critical value E, capability label Y i Determined by the average value of the two, if e i If the value is larger than the set critical value E, acquiring second expert annotation data Sa i Respectively calculate Sa i To St i 、Se i Capability tag Y i By Sa i And determining the average value of the score with smaller distance, wherein the calculation formula is as follows:
Figure BDA0002864911810000252
table 9 shows an example of a user materials processing and handling capability value tag (section) in an example of the present invention. Where the critical value e=20.
TABLE 9 user Material processing and handling capability value tag (part) example
Figure BDA0002864911810000253
Figure BDA0002864911810000261
Self-evaluation score St of teacher 1 1 Expert evaluation score Se =100 1 =100, teacher 1 material processing and processing final score Y 1 = (100+100)/2=100; self-evaluation score St of teacher 2 2 =75, expert evaluation score Se 2 =100,e 2 =|St 2 -Se 2 |=|75-100|=25>E=20, evaluation score St of expert 2 2 =80, while |80-100|>80-75, so teacher 2 processes and processes the material to a final score Y 2 = (80+75)/2=77.5; self-evaluation of teacher 3St dividing 3 Expert evaluation score Se =100 3 =90, teacher 3 material processing and processing final score Y 3 = (100+90)/2=95; end user material processing and handling capability score matrix y= (100,77.5,..95) T ;;
And S4, constructing a regression model based on a plurality of machine learning methods, wherein the regression model is used for processing the capability matrix output identification capability labels according to the input teaching materials, training the regression model by utilizing the teaching materials processing capability matrix set and the capability label set, and determining an optimal regression model.
The multiple regression models comprise a multiple linear regression model, a random forest regression model, a support vector machine regression model and a BP neural network regression model;
the multiple linear regression model is to fit a linear regression model by minimizing the sum of squares of residual errors between the value labels of sample users and the predicted values of the linear model, and the calculation formula of the value labels is as follows:
Figure BDA0002864911810000262
wherein Y is a value tag, C is a constant, R_picture is a picture richness variable, R_audio is an audio richness feature variable, R_video is a video richness feature variable, R_animation is an animation richness feature variable, D_use is a usage diversity feature variable, D_process is a processing type diversity feature variable, D_pic is a theme diversity feature variable, U_average is an average usage feature variable, U_max is a maximum usage feature variable, U_self is a self-usage total feature variable, U_student is a usage total feature variable, U_pattern is a usage Pattern feature variable, Q_share is an average shared feature variable, Q_diffuse is an average propagation feature variable, Q_diff_rate is a propagation feature variable, Q_colour is an average storage feature variable, Q_mcollet is a maximum storage feature variable, Q_download is an average usage feature variable, U_student is a student usage total feature variable, U_Path is a usage Pattern feature variable, Q_share is a maximum feature, Q_download is a receiver feature variable, and Q_download is a receiver feature The rating feature variable, Q_score, is the average scoring feature variable, Q_uderre is the used centrality feature variable, Q_utype is the used category feature variable, Q_emotion is the comment emotion tendencies feature variable, Q_cdegre is the comment centrality feature variable, T_fre is the update frequency feature variable, T_vol is the volatility feature variable,
Figure BDA0002864911810000271
and omega 1 ~ω 26 The epsilon is an error for the weight coefficient obtained by training;
the random forest regression model is an algorithm model which uses CART decision trees as weak learners and randomly selects features, T weak learners are independently trained through T times of acquisition, and a final result calculates regression results of the T weak learners by adopting a weighted average method;
the support vector machine regression model maps an input teaching material processing energy moment array into a high-dimensional feature space through a kernel function to realize regression calculation of a value tag, and a calculation formula of the value tag is as follows:
Figure BDA0002864911810000272
wherein Y is a value tag that is used to identify the value,
Figure BDA0002864911810000273
and alpha i For Lagrange coefficient, x is the characteristic variable of the processing attribute of the input user teaching material,/for the processing attribute>
Figure BDA0002864911810000274
As characteristic variable x i Is of transposed form->
Figure BDA0002864911810000275
Is a kernel function, satisfy->
Figure BDA0002864911810000276
b is a constant;
the BP neural network regression model is a three-layer neural network with an input layer, a hidden layer and an output layer, each layer is composed of a plurality of neurons, wherein the input layer is 27 in total, the number of the hidden layers is 9, the number of the output layer is 1 value label, and the regression of the value label is realized through the full connection of the neurons;
Dividing sample data formed by the teaching material processing capacity moment array set and the capacity label set into k groups, extracting 1 group of teachers from the sample data as a test set each time, taking the remaining k-1 groups of teachers as a training set, and gradually training the regression model for k times, wherein k=10 is set in the embodiment of the invention;
the trained evaluation effect value is the average absolute percentage error value of the regression model, which is marked as MAPE, and the calculation mode is as follows:
Figure BDA0002864911810000281
wherein M is the number of users corresponding to the test set sample, y' j Predicted value y of ability label for teacher j j The true value of the teacher j ability label;
s5, comparing the evaluation effects of different regression models, determining the regression model with the minimum MAPE value as the optimal regression model, and carrying out dynamic identification on the processing capacity of the user teaching materials.
The average MAPE value of the four regression models in the embodiment of the invention is 10.76%, and the effectiveness of the regression model is automatically identified based on the processing and processing capabilities of the user teaching materials based on multi-source data fusion is confirmed integrally, wherein the regression model is based on a multiple linear regression model L 1 The loss function MAPE value of (2) is only 5.29%, and the final selection is based on a multiple linear regression model L 1 Automatically identifying a regression model for the optimal regression model, namely the processing and processing capacity of the final characteristic user teaching materials;
Optimal regression model L 1 Picture richness R_picture, audio richness R_audio, video richness R_video, usage diversity D_use, theme diversity D_topic, processing type diversity D_process, average usage U_average, maximum usage U_maxFrom usage U_self, student usage U_student, usage U_parameter, usage pattern U_parameter, average share Q_share, average share Q_collect, maximum share Q_mcollect, average download Q_download, maximum download Q_mdowload, average score Q_score, used centrality Q_udetree, used category Q_utype, comment emotion trend Q_score, comment centrality Q_cdegreee, update frequency T_fre, volatility T_vol as independent variable, and stepwise regression analysis is performed using user teaching and handling capacity value tags as dependent variable, through model automatic identification, finally remaining picture richness, usage diversity, average share, maximum download, total 7 items in model, R direction value is 0.716, meaning picture richness, usage diversity, average share, average download, maximum share average score, and final fluctuation cause of 71. And the model passed F test (f=34.208, p=0.000 <0.05 The model is valid. In addition, the multiple collinearity of the model is checked, and the VIF values in the model are all smaller than 5, which means that the problem of collinearity does not exist; and the D-W value (D-w=2.016) is near the number 2, so that the model has no autocorrelation, no association relationship between sample data exists, and the model is good. Table 10 shows a stepwise regression model L according to an embodiment of the present invention 1 Specific results of (3).
TABLE 10 stepwise regression model L of the invention 1 Specific results of (3).
Figure BDA0002864911810000291
The final regression equation is: final score Y U_teacher Picture richness r_picture+26.389 usage diversity d_use+16.463 average usage u_average-1.153 average shared q_share+19.064 average collection q_collect+4.927 maximum downloading q_mdown-35.233 volatility t_vol;
taking a math male teacher 10626 of a primary school in a non-sample Z province as an example, the automatic test result of the processing and processing ability evaluation result of the teaching materials is described;
picture richness r_picture of teacher 10626 10626 1.08, usage diversity D_use 10626 1.00, average usage U_average 10626 An average share Q_share of 0.54 10626 3.93, average collection Q_collect 10626 0.00, maximum download amount Q_mdown 10626 0.49, volatility T_vol 10626 0.31;
based on the capacity assessment model L u_teacher The teaching material processing and handling capacity score of user 10626 was automatically calculated to be 71.14.
And S6, dynamically identifying the processing capacity of the user teaching materials by using the trained optimal regression model.
Collecting user update data at time t;
dynamically updating the capability label of the user based on the user updating data and the trained optimal regression model;
the multisource procedural data of the teaching material processing and processing capability of the teacher 10626 is updated by taking half a year as an update period C, and the picture richness R_picture of the teacher 10626 is obtained at 29 days of 2 months 2020 10626 1.28, usage diversity D_use 10626 1.00, average usage U_average 10626 An average share Q_share of 0.64 10626 3.93, average collection Q_collect 10626 0.00, maximum download amount Q_mdown 10626 0.49, volatility T_vol 10626 At 0.37, the final score is R_picture, which is rich in pictures by teacher 10626 at 30/8/2020 10626 2.68, usage diversity D_use 10626 1.00, average usage U_average 10626 Average share q_share of 0.81 10626 3.23, average collection Q_collect 10626 0.01, maximum download amount Q_mdown 10626 0.49, volatility T_vol 10626 0.38, a final score of 84.81;
the embodiment of the invention discloses an automatic recognition system for processing and processing capabilities of user teaching materials based on multi-source data fusion, which comprises the following steps:
the predefining module is used for predefining attributes of processing capacity of the user teaching materials and feature variables contained in each attribute;
the data acquisition module is used for acquiring user data from the teaching platform, carrying out multi-source data fusion according to the user data from behaviors, contents and social dimensions to determine the values of variables of the attribute of the teaching material processing capability of each user, and forming a teaching material processing capability matrix of each user by a tuple composed of the values of all characteristic variables of all the attribute of the teaching material processing capability of each user;
the system comprises a sample acquisition module, a processing module and a processing module, wherein the sample acquisition module is used for setting screening conditions to select a user set, acquiring a teaching material processing energy moment array set corresponding to the user set, and further acquiring a capability label set of the user set which is marked manually;
the training module is used for constructing a plurality of regression models, the regression models are used for outputting and identifying capacity labels according to the input teaching material processing capacity moment array, and training the regression models by utilizing the teaching material processing capacity moment array set and the capacity label set to determine an optimal regression model;
And the identification module is used for dynamically identifying the processing capacity of the user teaching materials by utilizing the trained optimal regression model.
The implementation principle and technical effect of the system are similar to those of the method, and are not repeated here.
It should be noted that, in any of the above embodiments, the methods are not necessarily sequentially executed in the sequence number, and it is meant that the methods may be executed in any other possible sequence, as long as it cannot be inferred from the execution logic that the methods are necessarily executed in a certain sequence.
It will be readily appreciated by those skilled in the art that the foregoing description is merely a preferred embodiment of the invention and is not intended to limit the invention, but any modifications, equivalents, improvements or alternatives falling within the spirit and principles of the invention are intended to be included within the scope of the invention.

Claims (8)

1. A method for automatically identifying processing capacity of a user teaching material based on multi-source data fusion is applied to a teaching platform supporting processing or management of the teaching material, and is characterized by comprising the following steps:
s1, defining attributes of processing capacity of teaching materials of users in advance, wherein each attribute comprises a characteristic variable; the attribute of the processing capability of the teaching materials comprises richness, diversity, availability, usefulness and timeliness;
The richness is used for representing the quantity distribution characteristics of teaching materials in different file formats;
the diversity is used for representing the application of teaching materials and the distribution characteristics of processing types;
the availability is used for representing the use characteristics of the teaching materials by an uploader of the teaching materials;
the usefulness is used for representing the approval characteristics of other people except the uploader of the teaching materials on the teaching materials;
the timeliness is used for representing fluctuation characteristics of the updating frequency of the teaching materials;
wherein the richness comprises 4 feature variables of picture richness, audio richness, video richness and animation richness;
the diversity comprises 3 characteristic variables of usage diversity, processing type diversity and theme diversity;
the availability comprises 5 characteristic variables of average usage amount, maximum usage amount, self-usage total amount, student usage total amount and usage mode;
the usefulness comprises 13 characteristic variables of average sharing quantity, average transmission quantity, transmission rate, average collection quantity, maximum collection quantity, average downloading quantity, maximum downloading quantity, acceptance rate, average grading, used centrality, used category, comment emotion tendency and comment centrality;
The timeliness comprises 2 characteristic variables of update frequency and volatility;
s2, collecting user data from the teaching platform, carrying out multi-source data fusion according to an analysis method of the user data based on behaviors, contents and social dimensions, and determining values of characteristic variables of attributes of teaching material processing capabilities of each user, wherein a matrix of the teaching material processing capabilities of each user is formed by an array of values of all the characteristic variables of all the attributes of the teaching material processing capabilities of each user;
s3, selecting a user set, acquiring a teaching material processing energy moment array set corresponding to the user set, and further acquiring a capability label set of the user set which is marked manually;
s4, constructing multiple regression models based on multiple machine learning methods, wherein the regression models are used for processing the capability labels capable of being identified by the output of the moment array according to the input teaching materials, training the regression models by utilizing the teaching materials processing capability moment array set and the capability label set, and determining an optimal regression model;
s5, dynamically identifying the processing capacity of the user teaching materials by using the trained optimal regression model.
2. The automatic recognition method for processing and processing capabilities of user teaching materials based on multi-source data fusion according to claim 1, wherein the user data comprises user basic data, teaching material label data, teaching material use behavior data, teaching material scoring behavior data and teaching material comment behavior data;
the user basic data comprises user id, user name, user role, user gender, user age, region, school type, taught school segment and taught discipline;
the basic data of the teaching materials comprise a teaching material id, a teaching material name, a material form, a material use and a processing type;
the teaching material tag data comprises a teaching material id, a tag name and a tag weight;
the teaching material use behavior data comprise a use behavior id, a user, a use behavior action, a teaching material, a behavior time and a behavior source;
the teaching material scoring behavior data comprises scoring behavior id, a user, teaching materials, scoring scores and behavior time;
the comment behavior data of the teaching materials comprise an evaluation behavior id, a user, teaching materials, comment content and behavior time.
3. The automatic recognition method for processing capacity of user teaching materials based on multi-source data fusion according to claim 1, wherein the behavior dimension analysis method comprises descriptive statistical analysis and K-means cluster analysis, and is mainly used for calculating picture richness, audio richness, video richness, animation richness, usage diversity, processing type diversity, average usage, maximum usage, self-usage, student usage, usage pattern, average sharing, average propagation, propagation rate, average collection, maximum collection, average downloading amount, maximum downloading amount, approval rate, average score, used category, update frequency and fluctuation characteristic variable;
the content-based dimension analysis method comprises multidimensional dimension analysis and emotion tendency analysis, and is mainly used for calculating theme diversity and comment emotion tendency characteristic variables;
social dimension analysis-based methods include social network analysis, primarily for computation of feature variables that are used centrality and comment centrality.
4. The automatic recognition method for processing and processing capabilities of user teaching materials based on multi-source data fusion according to claim 1, wherein the step S3 comprises the following steps:
S31, selecting the user set according to the region where the user is located, the type of school, the taught school and the dimension of the taught department, and marking the user set as U_teacher, marking the number of the user set as NU, and acquiring a corresponding teaching material processing capacity matrix X of each user in the user set i Form a corresponding teaching material processing energy moment array set, which is marked as X, X= (X) 1 ,X 2 ,...,X i ,...,X NU ) T Wherein X is i ∈U_teacher;
S32, acquiring a capability label set of a user set U_teacher marked by people, and marking the capability label set as Y u_teacher ,Y u_teacher =(Y 1 ,Y 2 ,...,Y i ,...,Y NU ) T Wherein Y is i Capability labels for each user, Y i ∈U_teacher;
Capability label Y i Is determined according to the self-labeling data of the user and expert labeling data, firstly, the self-labeling data St of the user is calculated i And the first expert annotation data Se i Error value e of (2) i =|St i -Se i I, if e i Less than a set critical value E, capability label Y i Determined by the average value of the two, if e i If the value is larger than the set critical value E, acquiring second expert annotation data Sa i Respectively calculate Sa i To St i 、Se i Capability tag Y i By Sa i And determining the average value of the score with smaller distance, wherein the calculation formula is as follows:
Figure FDA0004235691220000041
5. the automatic recognition method for processing capacity of user teaching materials based on multi-source data fusion according to claim 1, wherein the regression models comprise a multiple linear regression model, a random forest regression model, a support vector machine regression model and a BP neural network regression model;
The multiple linear regression model is to fit a linear regression model by minimizing the sum of squares of residuals between the value labels of the sample users and the predicted values of the linear model, and the value labels are calculated by the following formula:
Figure FDA0004235691220000042
wherein Y is a value tag, C is a constant, R_picture is a picture richness variable, R_audio is an audio richness feature variable, R_video is a video richness feature variable, R_animation is an animation richness feature variable, D_use is a usage diversity feature variable, D_process is a processing type diversity feature variable, D_touch is a theme diversity feature variable, U_average is an average usage feature variable, U_max is a maximum usage feature variable, U_self is an independent usage total feature variable, U_student is a total usage feature variable, U_pattern is a usage Pattern feature variable, Q_share is an average sharing feature variable, Q_direct is an average propagation feature variable, Q_direct is a propagation rate feature variable, Q_colour is an average storage feature variable, Q_mclec is a maximum usage diversity feature variable, Q_download is an average usage feature variable, U_max is a maximum usage feature variable, U_self is an independent usage total feature variable, U_student is a student is a total usage feature variable, U_Pattern is a usage Pattern feature variable, Q_share is an average sharing feature variable, Q_direct is an average propagation feature variable, Q_direct is a maximum feature variable, Q_capture feature variable is a Q_capture feature variable, Q_direct is a maximum feature variable, Q_capture feature is a score, Q_score is a feature variable, Q_score is a score feature variable, Q_update is a feature variable, and Q_score is a feature variable,
Figure FDA0004235691220000051
And omega 1 ~ω 26 The epsilon is an error for the weight coefficient obtained by training;
the random forest regression model is an algorithm model which uses CART decision trees as weak learners and randomly selects features, T weak learners are independently trained through T times of acquisition, and a final result calculates regression results of the T weak learners by adopting a weighted average method;
the support vector machine regression model maps an input teaching material processing energy moment array into a high-dimensional feature space through a kernel function to realize regression calculation of a value tag, and a calculation formula of the value tag is as follows:
Figure FDA0004235691220000052
wherein Y is a value tag that is used to identify the value,
Figure FDA0004235691220000053
and alpha i For Lagrange coefficient, x is the characteristic variable of the processing attribute of the input user teaching material,/for the processing attribute>
Figure FDA0004235691220000054
As characteristic variable x i Is of transposed form->
Figure FDA0004235691220000055
Is a kernel function, satisfy->
Figure FDA0004235691220000056
b is a constant;
the BP neural network regression model is a three-layer neural network with an input layer, a hidden layer and an output layer, each layer is composed of a plurality of neurons, wherein the input layer is 27 in total, the number of the hidden layers is 9, the number of the output layer is 1 value label, and the regression of the value label is realized through the full connection of the neurons.
6. The automatic recognition method for processing and processing capabilities of user teaching materials based on multi-source data fusion according to claim 1, wherein the step S4 comprises the following steps:
dividing sample data formed by the teaching material processing energy moment array set and the energy label set into k groups, extracting 1 group of teachers from the sample data as a test set each time, taking the remaining k-1 groups of teachers as a training set, and gradually training the regression model through k times;
the trained evaluation effect value is the average absolute percentage error value of the regression model, which is marked as MAPE, and the calculation mode is as follows:
Figure FDA0004235691220000057
wherein M is the number of users corresponding to the test set sample, y' j Predicted value y of ability label for teacher j j The true value of the teacher j ability label;
and comparing the evaluation effects of different regression models, and determining the regression model with the minimum MAPE value as the optimal regression model.
7. The automatic recognition method for processing and processing capabilities of user teaching materials based on multi-source data fusion as set forth in claim 1, further comprising the step of S6:
collecting user update data at time t;
and dynamically updating the capability label of the user based on the user updating data and the trained optimal regression model.
8. The automatic recognition system for processing and processing capabilities of the user teaching materials based on multi-source data fusion is applied to a teaching platform supporting processing and management of the teaching materials, and is characterized by comprising the following components:
the predefining module is used for predefining attributes of processing capacity of the user teaching materials and feature variables contained in each attribute; the attribute of the processing capability of the teaching materials comprises richness, diversity, availability, usefulness and timeliness;
the richness is used for representing the quantity distribution characteristics of teaching materials in different file formats;
the diversity is used for representing the application of teaching materials and the distribution characteristics of processing types;
the availability is used for representing the use characteristics of the teaching materials by an uploader of the teaching materials;
the usefulness is used for representing the approval characteristics of other people except the uploader of the teaching materials on the teaching materials;
the timeliness is used for representing fluctuation characteristics of the updating frequency of the teaching materials;
wherein the richness comprises 4 feature variables of picture richness, audio richness, video richness and animation richness;
the diversity comprises 3 characteristic variables of usage diversity, processing type diversity and theme diversity;
The availability comprises 5 characteristic variables of average usage amount, maximum usage amount, self-usage total amount, student usage total amount and usage mode;
the usefulness comprises 13 characteristic variables of average sharing quantity, average transmission quantity, transmission rate, average collection quantity, maximum collection quantity, average downloading quantity, maximum downloading quantity, acceptance rate, average grading, used centrality, used category, comment emotion tendency and comment centrality;
the timeliness comprises 2 characteristic variables of update frequency and volatility;
the data acquisition module is used for acquiring user data from the teaching platform, carrying out multi-source data fusion according to the user data from behaviors, contents and social dimensions to determine the values of variables of the attribute of the teaching material processing capability of each user, and forming a teaching material processing capability matrix of each user by a tuple composed of the values of all characteristic variables of all the attribute of the teaching material processing capability of each user;
the system comprises a sample acquisition module, a processing module and a processing module, wherein the sample acquisition module is used for setting screening conditions to select a user set, acquiring a teaching material processing energy moment array set corresponding to the user set, and further acquiring a capability label set of the user set which is marked manually;
The training module is used for constructing a plurality of regression models, the regression models are used for outputting and identifying capacity labels according to the input teaching material processing capacity moment array, and training the regression models by utilizing the teaching material processing capacity moment array set and the capacity label set to determine an optimal regression model;
and the identification module is used for dynamically identifying the processing capacity of the user teaching materials by utilizing the trained optimal regression model.
CN202011583583.7A 2020-12-28 2020-12-28 Automatic identification method and system for processing capability of user teaching materials Active CN112699933B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011583583.7A CN112699933B (en) 2020-12-28 2020-12-28 Automatic identification method and system for processing capability of user teaching materials

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011583583.7A CN112699933B (en) 2020-12-28 2020-12-28 Automatic identification method and system for processing capability of user teaching materials

Publications (2)

Publication Number Publication Date
CN112699933A CN112699933A (en) 2021-04-23
CN112699933B true CN112699933B (en) 2023-07-07

Family

ID=75513027

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011583583.7A Active CN112699933B (en) 2020-12-28 2020-12-28 Automatic identification method and system for processing capability of user teaching materials

Country Status (1)

Country Link
CN (1) CN112699933B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116167669B (en) * 2023-04-26 2023-07-21 国网浙江省电力有限公司金华供电公司 Carbon emission assessment method based on power consumption regression

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108846530A (en) * 2018-09-28 2018-11-20 国网上海市电力公司 One kind being based on the short-term load forecasting method of " cluster-recurrence " model
CN109191953A (en) * 2018-11-12 2019-01-11 重庆靶向科技发展有限公司 A kind of intelligentized system of teaching and learning and method
CN111275239A (en) * 2019-12-20 2020-06-12 西安电子科技大学 Multi-mode-based networked teaching data analysis method and system

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140272914A1 (en) * 2013-03-15 2014-09-18 William Marsh Rice University Sparse Factor Analysis for Learning Analytics and Content Analytics
WO2017013667A1 (en) * 2015-07-17 2017-01-26 Giridhari Devanathan Method for product search using the user-weighted, attribute-based, sort-ordering and system thereof

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108846530A (en) * 2018-09-28 2018-11-20 国网上海市电力公司 One kind being based on the short-term load forecasting method of " cluster-recurrence " model
CN109191953A (en) * 2018-11-12 2019-01-11 重庆靶向科技发展有限公司 A kind of intelligentized system of teaching and learning and method
CN111275239A (en) * 2019-12-20 2020-06-12 西安电子科技大学 Multi-mode-based networked teaching data analysis method and system

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
在线学习社区发帖质量评价的回归模型研究;刘金晶;王丽英;;南京师范大学学报(工程技术版)(第01期);全文 *
融合网络学习空间过程性数据的中小学教师信息素养评估研究;李亚婷;陈敏;王欢;周驰;王会军;;中国电化教育(第09期);全文 *

Also Published As

Publication number Publication date
CN112699933A (en) 2021-04-23

Similar Documents

Publication Publication Date Title
CN107085803B (en) Individualized teaching resource recommendation system based on knowledge graph and ability evaluation
CN107230174B (en) Online interactive learning system and method based on network
Wu et al. Stimulating innovation with an innovative curriculum: a curriculum design for a course on new product development
Kotsiantis Use of machine learning techniques for educational proposes: a decision support system for forecasting students’ grades
Matzavela et al. Decision tree learning through a predictive model for student academic performance in intelligent m-learning environments
Te Wang et al. A blog-based dynamic learning map
Denison et al. Annoyance or delight? College students' perspectives on looking for information
Wang et al. Data mining for adaptive learning sequence in English language instruction
CN113656687B (en) Teacher portrait construction method based on teaching and research data
Sanvitha Kasthuriarachchi et al. A data mining approach to identify the factors affecting the academic success of tertiary students in Sri Lanka
Zhong et al. Design of a personalized recommendation system for learning resources based on collaborative filtering
CN112699933B (en) Automatic identification method and system for processing capability of user teaching materials
José-García et al. C3-IoC: A career guidance system for assessing student skills using machine learning and network visualisation
CN116860978B (en) Primary school Chinese personalized learning system based on knowledge graph and large model
Oreski et al. CRISP-DM process model in educational setting
Méndez-Carbajo Visualizing data and the online FRED database
CN115600834A (en) Middle and primary school teacher digital literacy evaluation method and system based on procedural data
Montes-Orozco et al. Mexican university ranking based on maximal clique
Chanasattru et al. The Word List Distribution in Social Science Research Articles
Ridwan et al. Mapping critical thinking research in physical education: A review of the publishing or perish literature and bibliometric analysis
Moe et al. Evaluation for teacher’s ability and forecasting student’s career based on big data
Li et al. A Cluster Study on MOOC Students' Participation Patterns: A Case Study of a Chinese MOOC
CN115455310B (en) Knowledge recommendation method based on collaborative filtering algorithm
Xu Application research of stem education in big data+ math core literacy cultivation
Zhen et al. Online Education and Learning Model of Applied Optics Course Based on Artificial Intelligence

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant