CN112699933B

CN112699933B - Automatic identification method and system for processing capability of user teaching materials

Info

Publication number: CN112699933B
Application number: CN202011583583.7A
Authority: CN
Inventors: 吴砥; 陈敏; 李亚婷; 徐建
Original assignee: Central China Normal University
Current assignee: Central China Normal University
Priority date: 2020-12-28
Filing date: 2020-12-28
Publication date: 2023-07-07
Anticipated expiration: 2040-12-28
Also published as: CN112699933A

Abstract

The invention discloses a method and a system for automatically identifying processing capacity of a user teaching material based on multi-source data fusion. The method comprises the following steps: s1, predefining attributes of processing capacity of a user teaching material and feature variables contained in each attribute, wherein the attributes comprise richness, diversity, availability, usefulness and timeliness; s2, collecting user data from the teaching platform, and calculating a teaching material processing capacity matrix of each user according to the user data; s3, acquiring sample data of the user set; s4, constructing a regression model based on a plurality of machine learning methods, training the regression model by using sample data, and determining an optimal regression model; s5, dynamically identifying the processing capacity of the user teaching materials by using the trained optimal regression model. The invention can realize the intelligent automatic identification of the processing and processing capacity of the teaching materials of the user.

Description

Automatic identification method and system for processing capability of user teaching materials

Technical Field

The invention belongs to the field of education informatization, and particularly relates to a method and a system for automatically identifying processing capacity of user teaching materials based on multi-source data fusion.

Background

With the development of computer technology, various teaching platforms for assisting teaching are important information carriers in teaching, and the teaching platforms include, but are not limited to, regional education resource public service platforms, online teaching platforms, network research and repair platforms, online training platforms, education management platforms and the like. In teaching based on a teaching platform, recognition of processing and processing capabilities of teaching materials of users such as users is very important.

At present, the processing and processing capacity of the teaching materials of the user is still identified in a questionnaire form, for example, the user performs self-evaluation through a scale or test questions, only the current state of the user is concerned, the investigation process has a certain subjectivity and needs high coordination, meanwhile, the consideration of the processing and processing process data of the objective teaching materials of the user is ignored, and the problems of inaccurate identification, low identification efficiency and low data utilization rate exist. How to use computer technology to realize more objective, more accurate and more continuous intelligent automatic identification based on user data of users on a teaching platform is an important problem. There is no more sophisticated computer-based automated identification technology in the prior art.

Disclosure of Invention

Aiming at least one defect or improvement requirement of the prior art, the invention provides a method and a system for automatically identifying processing capacity of a user teaching material based on multi-source data fusion, which can realize intelligent automatic identification of processing and processing capacity of the user teaching material.

In order to achieve the above object, according to a first aspect of the present invention, there is provided a method for automatically identifying processing capability of a user teaching material based on multi-source data fusion, applied to a teaching platform supporting processing or management of the teaching material, including the steps of:

s1, defining attributes of processing capacity of teaching materials of users in advance, wherein each attribute comprises a characteristic variable;

s2, collecting user data from the teaching platform, carrying out multi-source data fusion according to an analysis method of the user data based on behaviors, contents and social dimensions, and determining values of characteristic variables of attributes of teaching material processing capabilities of each user, wherein a matrix of the teaching material processing capabilities of each user is formed by an array of values of all the characteristic variables of all the attributes of the teaching material processing capabilities of each user;

s3, selecting a user set, acquiring a teaching material processing energy moment array set corresponding to the user set, and further acquiring a capability label set of the user set which is marked manually;

S4, constructing multiple regression models based on multiple machine learning methods, wherein the regression models are used for processing the capability labels capable of being identified by the output of the moment array according to the input teaching materials, training the regression models by utilizing the teaching materials processing capability moment array set and the capability label set, and determining an optimal regression model;

s5, dynamically identifying the processing capacity of the user teaching materials by using the trained optimal regression model.

Preferably, the attribute of the processing capability of the teaching materials comprises richness, diversity, usability, usefulness and timeliness;

the richness is used for representing the quantity distribution characteristics of teaching materials in different file formats;

the diversity is used for representing the application of teaching materials and the distribution characteristics of processing types;

the availability is used for representing the use characteristics of the teaching materials by an uploader of the teaching materials;

the usefulness is used for representing the approval characteristics of other people except the uploader of the teaching materials on the teaching materials;

the timeliness is used for representing fluctuation characteristics of the updating frequency of the teaching materials.

Preferably, the richness comprises 4 feature variables of picture richness, audio richness, video richness and animation richness;

The diversity comprises 3 characteristic variables of usage diversity, processing type diversity and theme diversity;

the availability comprises 5 characteristic variables of average usage amount, maximum usage amount, self-usage total amount, student usage total amount and usage mode;

the usefulness comprises 13 characteristic variables of average sharing quantity, average transmission quantity, transmission rate, average collection quantity, maximum collection quantity, average downloading quantity, maximum downloading quantity, acceptance rate, average grading, used centrality, used category, comment emotion tendency and comment centrality;

the timeliness includes 2 characteristic variables of update frequency and volatility.

Preferably, the user data comprises user basic data, teaching material label data, teaching material use behavior data, teaching material scoring behavior data and teaching material comment behavior data;

the user basic data comprises user id, user name, user role, user gender, user age, region, school type, taught school segment and taught discipline;

the basic data of the teaching materials comprise a teaching material id, a teaching material name, a material form, a material use and a processing type;

The teaching material tag data comprises a teaching material id, a tag name and a tag weight;

the teaching material use behavior data comprise a use behavior id, a user, a use behavior action, a teaching material, a behavior time and a behavior source;

the teaching material scoring behavior data comprises scoring behavior id, a user, teaching materials, scoring scores and behavior time;

the comment behavior data of the teaching materials comprise an evaluation behavior id, a user, teaching materials, comment content and behavior time.

Preferably, the behavior dimension analysis method comprises descriptive statistical analysis and K-means cluster analysis, and is mainly used for calculating picture richness, audio richness, video richness, animation richness, usage diversity, processing type diversity, average usage, maximum usage, self-usage, student usage, usage pattern, average sharing, average propagation, propagation rate, average collection, maximum collection, average download, maximum download, acceptance rate, average score, used category, update frequency and fluctuation characteristic variable;

the content-based dimension analysis method comprises multidimensional dimension analysis and emotion tendency analysis, and is mainly used for calculating theme diversity and comment emotion tendency characteristic variables;

Social dimension analysis-based methods include social network analysis, primarily for computation of feature variables that are used centrality and comment centrality.

Preferably, the step S3 includes the steps of:

s31, selecting the user set according to the region where the user is located, the type of school, the taught school and the dimension of the taught department, and marking the user set as U_teacher, marking the number of the user set as NU, and acquiring a corresponding teaching material processing capacity matrix X of each user in the user set _i Form a corresponding teaching material processing energy moment array set, which is marked as X, X= (X) ₁ ,X ₂ ,...,X _i ,...,X _NU ) ^T Wherein X is _i ∈U_teacher；

S32, acquiring a capability label set of a user set U_teacher marked by people, and marking the capability label set as Y _{u_teacher} ，Y _{u_teacher} ＝(Y ₁ ,Y ₂ ,...,Y _i ,...,Y _NU ) ^T Wherein Y is _i Capability labels for each user, Y _i ∈U_teacher；

Capability label Y _i Is determined according to the self-labeling data of the user and expert labeling data, firstly, the self-labeling data St of the user is calculated _i And the first expert annotation data Se _i Error value e of (2) _i ＝|St _i -Se _i I, if e _i Less than a set critical value E, capability label Y _i Determined by the average value of the two, if e _i If the value is larger than the set critical value E, acquiring second expert annotation data Sa _i Respectively calculate Sa _i To St _i 、Se _i Capability tag Y _i By Sa _i And determining the average value of the score with smaller distance, wherein the calculation formula is as follows:

Preferably, the multiple regression models comprise a multiple linear regression model, a random forest regression model, a support vector machine regression model and a BP neural network regression model;

the multiple linear regression model is to fit a linear regression model by minimizing the sum of squares of residuals between the value labels of the sample users and the predicted values of the linear model, and the value labels are calculated by the following formula:

wherein Y is a value tag, C is a constant, R_picture is a picture richness variable, R_audio is an audio richness feature variable, R_video is a video richness feature variable, R_animation is an animation richness feature variable, D_use is a usage diversity feature variable, D_process is a processing type diversity feature variable, D_touch is a theme diversity feature variable, U_average is an average usage feature variable, U_max is a maximum usage feature variable, U_self is an independent usage total feature variable, U_student is a total usage feature variable, U_pattern is a usage Pattern feature variable, Q_share is an average sharing feature variable, Q_direct is an average propagation feature variable, Q_direct is a propagation rate feature variable, Q_colour is an average storage feature variable, Q_mclec is a maximum usage diversity feature variable, Q_download is an average usage feature variable, U_max is a maximum usage feature variable, U_self is an independent usage total feature variable, U_student is a student is a total usage feature variable, U_Pattern is a usage Pattern feature variable, Q_share is an average sharing feature variable, Q_direct is an average propagation feature variable, Q_direct is a maximum feature variable, Q_capture feature variable is a Q_capture feature variable, Q_direct is a maximum feature variable, Q_capture feature is a score, Q_score is a feature variable, Q_score is a score feature variable, Q_update is a feature variable, and Q_score is a feature variable,

And omega ₁ ～ω ₂₆ The epsilon is an error for the weight coefficient obtained by training;

the random forest regression model is an algorithm model which uses CART decision trees as weak learners and randomly selects features, T weak learners are independently trained through T times of acquisition, and a final result calculates regression results of the T weak learners by adopting a weighted average method;

the support vector machine regression model maps an input teaching material processing energy moment array into a high-dimensional feature space through a kernel function to realize regression calculation of a value tag, and a calculation formula of the value tag is as follows:

wherein Y is a value tag that is used to identify the value,

and alpha _i For Lagrange coefficient, x is the characteristic variable of the processing attribute of the input user teaching material,/for the processing attribute>

As characteristic variable x _i Is of transposed form->

Is a kernel function, satisfy->

b is a constant;

the BP neural network regression model is a three-layer neural network with an input layer, a hidden layer and an output layer, each layer is composed of a plurality of neurons, wherein the input layer is 27 in total, the number of the hidden layers is 9, the number of the output layer is 1 value label, and the regression of the value label is realized through the full connection of the neurons.

Preferably, the step S4 includes the steps of:

dividing sample data formed by the teaching material processing energy moment array set and the energy label set into k groups, extracting 1 group of teachers from the sample data as a test set each time, taking the remaining k-1 groups of teachers as a training set, and gradually training the regression model through k times;

the trained evaluation effect value is the regressionThe average absolute percentage error value of the model, denoted MAPE, is calculated by:

wherein M is the number of users corresponding to the test set sample, y' _j Predicted value y of ability label for teacher j _j The true value of the teacher j ability label;

and comparing the evaluation effects of different regression models, and determining the regression model with the minimum MAPE value as the optimal regression model.

Preferably, the method further comprises step S6:

collecting user update data at time t;

and dynamically updating the capability label of the user based on the user updating data and the trained optimal regression model.

According to a second aspect of the present invention, there is provided an automatic recognition system for processing capability of a user teaching material based on multi-source data fusion, applied to a teaching platform supporting processing or management of the teaching material, comprising:

the predefining module is used for predefining attributes of processing capacity of the user teaching materials and feature variables contained in each attribute;

The data acquisition module is used for acquiring user data from the teaching platform, carrying out multi-source data fusion according to the user data from behaviors, contents and social dimensions to determine the values of variables of the attribute of the teaching material processing capability of each user, and forming a teaching material processing capability matrix of each user by a tuple composed of the values of all characteristic variables of all the attribute of the teaching material processing capability of each user;

the system comprises a sample acquisition module, a processing module and a processing module, wherein the sample acquisition module is used for setting screening conditions to select a user set, acquiring a teaching material processing energy moment array set corresponding to the user set, and further acquiring a capability label set of the user set which is marked manually;

the training module is used for constructing a plurality of regression models, the regression models are used for outputting and identifying capacity labels according to the input teaching material processing capacity moment array, and training the regression models by utilizing the teaching material processing capacity moment array set and the capacity label set to determine an optimal regression model;

and the identification module is used for dynamically identifying the processing capacity of the user teaching materials by utilizing the trained optimal regression model.

In summary, the invention has the advantages and positive effects that:

(1) The multi-dimensional procedural data in the education platform can be fully utilized, the intelligent automatic identification of the processing and processing capacity of the user teaching materials is realized, the characteristics of objectivity, accuracy and duration are realized, more time is spent only when the model is trained in advance, and the characteristics of high speed and high efficiency are realized when the trained model is applied to identification.

(2) In addition, a time dimension is introduced, an automatic dynamic updating and identifying mode is supported, and large-scale and continuous evaluation work such as processing and processing capacity of teaching materials of users, literacy of teacher information and the like is facilitated.

(3) The accuracy of intelligent automatic identification can be further improved by optimizing the capability attribute/feature variable and the collected data type.

Drawings

Fig. 1 is a general flowchart of a method for dynamically evaluating processing and processing capabilities of user teaching materials based on multi-source data fusion according to an embodiment of the present invention.

Detailed Description

The present invention will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention. In addition, the technical features of the embodiments of the present invention described below may be combined with each other as long as they do not collide with each other.

Fig. 1 shows a general flowchart of a method for automatically identifying processing capability of a user teaching material based on multi-source data fusion, which is applied to a teaching platform supporting processing or management of the teaching material, and includes the following steps:

s1, defining attributes of processing capacity of the teaching materials of the user and feature variables contained in each attribute in advance.

The attribute of the processing capability of the teaching materials comprises richness, diversity, availability, usefulness and timeliness;

the timeliness is used for representing fluctuation characteristics of the updating frequency of the teaching materials;

the richness comprises 4 characteristic variables of picture richness R_picture, audio richness R_audio, video richness R_video and animation richness R_animation;

The picture richness R_picture refers to a log function standardized value of the number N_picture of the picture teaching materials uploaded by a user, and a picture richness calculation formula for any user i is as follows: r_picture _i ＝log ₁₀ (N_picture _i )；

The audio richness R_audio refers to a log function standardized value of the number N_audio of audio teaching materials uploaded by a user, and the audio richness calculation formula for any user i is as follows: r_audio _i ＝log ₁₀ (N_audio _i )；

The video richness R_video refers to a log function standardized value of the number N_video of video teaching materials uploaded by a user, and the video richness calculation formula for any user i is as follows: r_video _i ＝log ₁₀ (N_video _i )；

The animation richness R_animation refers to the transmission painting teaching on the userThe log function standardized value of the number of the learning materials N_animation is calculated according to the following formula for the animation richness of any user i: r_animation _i ＝log ₁₀ (N_animation _i )；

The diversity comprises a purpose diversity D_use, a processing type diversity D_process and a theme diversity D_topic3 characteristic variables;

the usage diversity D_use is the proportion of the usage number N_use of the user uploaded teaching materials to the total number Num_of_uses of the teaching materials, and the usage diversity calculation formula of the teaching materials for any user i is as follows:

The processing type diversity D_process refers to the proportion of the number N_process of processing types of the user uploaded teaching materials to the total number Num_of_process of processing forms of the teaching materials, and the processing type diversity calculation formula of the teaching materials for any user i is as follows:

the topic diversity refers to the proportion of the number N_topic of topics of the user uploaded teaching materials to the total number Num_of_topic of the teaching materials, and the topic diversity calculation formula of the teaching materials compared with any user i is as follows:

the availability comprises 5 characteristic variables of average usage U_average, maximum usage U_max, total self-usage U_self, total student usage U_student and usage mode U_pattern;

the average usage U_average refers to the ratio of the sum of the usage U_each of the teaching materials uploaded by the user to the total number N_all of the teaching materials uploaded by the user, and the calculation formula of the average usage of the teaching materials for any user i is as follows:

the maximum usage U_max refers to a log function standardized value of the maximum value of the usage U_each of each teaching material uploaded by a user, and a calculation formula of the maximum usage of the teaching material for any user i is as follows:

the self-use total amount U_teacher refers to a log function standardized value of the sum of the self-use amount U_teacher of each teaching material uploaded by a user, and the self-use total amount calculation formula for the teaching material of any user i is as follows:

The total student use amount U_student refers to a log function standardized value of the sum of the student use amounts U_secret of each teaching material uploaded by a user, and the total student use amount calculation formula for any teaching material of user i is as follows:

the use mode U_pattern is a result of clustering the use modes of all users of the teaching platform for uploading teaching materials based on k-means;

the usefulness comprises 13 characteristic variables of average sharing quantity Q_share, average propagation quantity Q_diffuse, propagation rate Q_diffuse_rate, average collection quantity Q_collection, maximum collection quantity Q_mcollect, average download quantity Q_download, maximum download quantity Q_mdown, acceptance rate Q_acceptance, average score Q_score, used centrality Q_udenree, used category Q_utype, comment emotion tendency Q_emotion and comment centrality Q_cdegrey;

the average sharing quantity Q_share refers to the ratio of the sum of the shared quantity Q_share_each of all teaching materials uploaded by a user to the total number N_all of the teaching materials uploaded by the user, and the calculation formula of the average sharing quantity of the teaching materials of any user i is as follows:

the average propagation quantity Q_share refers to the ratio of the sum of the browsed quantity Q_diffuse_each of each teaching material uploaded by the user through a sharing link to the total number N_all of the teaching materials uploaded by the user, and the calculation formula of the average propagation quantity of the teaching materials for any user i is as follows:

The propagation rate Q_diffuse_rate refers to the ratio of the average propagation quantity Q_diffuse of the user uploading education materials to the average sharing quantity Q_share of the user uploading education materials, and the calculation formula of the propagation rate of the education materials for any user i is as follows:

the average collection Q_collection refers to the ratio of the sum of the collection Q_collection_each of all teaching materials uploaded by a user to the total number N_all of the teaching materials uploaded by the user, and the calculation formula of the average collection of the teaching materials of any user i is as follows:

the maximum collection Q_mcollet refers to a log function standardized value of the maximum value of the collection Q_collect_each of each teaching material uploaded by a user, and the maximum collection calculation formula for the teaching material of any user i is as follows:

the average download amount q_download refers to the ratio of the sum of the download amounts q_download_each of each teaching material uploaded by the user to the total number n_all of the teaching materials uploaded by the user, and the calculation formula of the average download amount of the teaching materials for any user i is as follows:

the maximum download amount q_mdown refers to a log function standardized value of a maximum value of each teaching material of the user uploaded teaching materials by the download amount q_download_each, and a calculation formula of the maximum download amount of the teaching materials for any user i is as follows:

The acceptance rate q_acceptance refers to the ratio of the sum of the collected q_collection_each and the downloaded q_download_each of each teaching material of the user uploading teaching material to the browsed q_browse_each of each teaching material, and the calculation formula of the acceptance rate of the teaching material for any user i is as follows:

the average score Q_score refers to the ratio of the sum of the scores Q_score_each of all the materials uploaded by the user to the total number N_all of the materials uploaded by the user, and the calculation formula of the average score of the teaching materials of any user i is as follows:

the used centrality Q_uderre refers to the ratio of the total number U_use of users uploading all teaching materials used by others to the number U of users minus one, and the used centrality calculation formula for any user i is as follows:

the used category Q_utype is a result of clustering the used modes of all users of the teaching platform for uploading teaching materials based on k-means;

the comment emotion tendency Q_emotion refers to the ratio of the sum of forward emotion comments Q_emotion_each of all materials uploaded by a user to the total number N_all of the materials uploaded by the user, and the average score calculation formula of the teaching materials of any user i is as follows:

The comment centrality Q_cdegree is the ratio of the total number U_comment of all teaching materials uploaded by the user and reviewed by others to the number U of users, and the comment centrality calculation formula for any user i is as follows:

the equalization comprises updating frequency T_fre and fluctuation T_vol2 characteristic variables;

the update frequency t_fre refers to the average number of times that the user uploads the teaching material n_time in each time period T within the time period T, and the calculation formula of the update frequency of the teaching material of any user i is as follows:

the fluctuation T_vol refers to the ratio of the user to the time period T of uploading the teaching material N_time in each time period T and the reference percentage B of the teaching material in the time period T _t The sum of squares of the differences of the teaching material volatility calculation formula for any user i is:

in the embodiment of the invention, T=12 is selected, and the provided reference percentile of the teaching materials in 12 months is as follows: b= {8%,10%,8%,8%,8%, 10%,8%,8%,8%,8% };

s2, collecting user data from the teaching platform, carrying out multi-source data fusion according to an analysis method of the user data based on behaviors, contents and social dimensions to determine the values of characteristic variables of the attribute of the teaching material processing capability of each user, and forming a teaching material processing capability matrix of each user by a tuple composed of the values of all the characteristic variables of all the attribute of the teaching material processing capability of each user.

The behavior dimension analysis method comprises descriptive statistical analysis and K-means cluster analysis, and is mainly used for calculating picture richness, audio richness, video richness, animation richness, usage diversity, processing type diversity, average usage, maximum usage, self-usage total, student usage total, usage mode, average sharing quantity, average propagation quantity, propagation rate, average collection quantity, maximum collection quantity, average downloading quantity, maximum downloading quantity, acceptance rate, average score, used category, updating frequency and fluctuation characteristic variable.

social dimension analysis-based methods include social network analysis, primarily for computation of feature variables that are used centrality and comment centrality. The teaching platform comprises, but is not limited to, an education application support platform such as an area education resource public service platform, an online teaching platform, a network study and repair platform, an online training platform, an education management platform and the like; the embodiment of the invention adopts a Z province educational resource public service platform network learning space, and the time for data acquisition is 2019, 08 and 30;

The user data comprises user basic data, teaching material label data, teaching material use behavior data, teaching material scoring behavior data and teaching material comment behavior data;

the user basic data comprise user id, user name, user role, user gender, user age, location area, school type, taught section and taught department, and can be expressed by U= (U_id, U_name, U_type, U_gene, U_age, U_area, U_school, U_section and U_subject);

the value range of the user role U_type comprises teaching teachers, students and others, and can be expressed as follows: u_type= { u_teacher, u_student, u_other };

the value range of the user gender U_gener is {0,1}, wherein 0 represents female and 1 represents male;

the value range of the school category U_school comprises cities, counties and villages, and can be expressed as follows: u_school= { u_city, u_town, u_count };

the value range of the taught segment U_section comprises elementary school, junior middle school, high middle school and none, and can be expressed as u_section= { u_primary, u_junior, u_high,0}, wherein the taught segment of students and other roles can only take value 0;

The value range of the taught subject U_subject includes Chinese, mathematics, english, physics, chemistry, biology, history, politics, geography, society, science, sports, music, art, health, law, information technology, comprehensive practice and nothing, which can be expressed as u_subject= { u_Chinese, u_maths, u_England, u_physics, u_chemistry, u_biology, u_history, u_politics, u_geometry, u_society, u_science, u_sports, u_music, u_parts, u_health, u_regal, u_ information technology, u_ comprehensive practice,0}, wherein the taught subjects of students and other roles can only take values of 0;

table 1 is a partial example of a user basic data acquisition result of a teacher in a user role provided in the embodiment of the present invention, wherein the user role is 10625 users in total;

table 1 user role is an example of user basic data acquisition results (part) for a teacher

Wherein, teacher 1 is a Li teacher, which is a 34-year-old female mathematics teacher for primary school in certain city in S city of Z province;

teacher 2 is a teacher, which is a 33-year-old male biological teacher in S city of Z province;

teacher 10625 is a male scientific teacher of 45 years old in city primary school in the city of D, Z, zhao teacher;

The basic data of the teaching materials comprises teaching material ids, teaching material names, material forms, material purposes and processing types, and can be represented by M= (M_id, M_name, M_format, M_use and M_type);

the value range of the teaching material form M_format comprises pictures, audio, video and animation, and can be expressed as follows: m_format= { m_picture, m_audio, m_video, m_animation };

the value range of the teaching material application M_use comprises pre-class use, in-class use and after-class review use, and can be expressed as follows: m_use= { m_before, m_in, m_after }, wherein the total number of teaching material uses num_of_use=3;

the value range of the teaching material processing type M_comprises conversion, beautification, section selection and integration, and can be expressed as follows: m_type= { m_cover, m_emmbelish, m_exclerpt, m_integration }, where the total number of teaching material processing types num_of_process = 4;

table 2 is a partial example of the basic data acquisition result of the teaching materials provided by the embodiment of the invention, wherein the total number of the teaching materials is 95348;

table 2 teaching materials basic data acquisition results (partial) example

M_id	M_name	M_format	M_use	M_type
					1	Small ×	m_picture	m_before	m_convert
2	I' x	m_picture	m_in	m_integration
					...	...	...	...	...
95348	Class x	m_video	m_in	m_excerpt

The teaching material with M_id of 1 is a picture teaching material used by pre-learning before a class after conversion processing;

The teaching material with M_id of 2 is a picture teaching material used in a classroom after integration processing;

the teaching material with M_id of 95348 is a video teaching material used in a classroom after beautification processing;

the teaching material tag data comprises a teaching material id, a tag name and a tag weight, and can be represented by L= (M_id, L_name and L_weight);

the label weight L_weight represents the number of times of occurrence of the label, and the value range is [0, + ]; table 2 teaching materials tag data acquisition results (partial) example

Table 3 is a partial example of the acquisition result of the teaching material tag data provided by the embodiment of the present invention, wherein the total number of the teaching tags is 543325;

table 3 partial examples of teaching material tag data acquisition results

M_id	L_name	L_weight
			1	Geometry of	5
2	Round shape	2
			...	...	...
95348	Mathematics	10

Wherein, the teaching material with M_id of 1 is marked as geometric 5 times and circular 2 times;

the teaching material with m_id 95348 is marked as math 10 times;

the user uploads, browses, collects, downloads, uses, shares the procedural use behavior data such as teaching materials, including use behavior id, user id, use behavior action, teaching materials, behavior Time, behavior source, can be represented by B= (B_id, U, B_action, M, time, B_source);

The value range of the usage behavior action B_action includes uploading, browsing, collecting, downloading, using and sharing, and can be expressed as follows: b_action= { b_upload, b_browse, b_collect, b_download, b_use, b_share };

the range of the behavior source B_source includes searching, sharing and others, and can be expressed as follows: b_source= { b_shared, b_other };

table 4 is a partial example of a user uploading, browsing, collecting, downloading, using, sharing teaching materials and other procedural usage behavior data acquisition results provided by the embodiment of the present invention, where the usage behavior data is 406576;

table 4 teaching materials use behavior data acquisition results (partial) example

The use behavior of the B_id of 1 is that a user with the U_id of 1 browses teaching materials with the M_id of 198 in a mode of searching 7 minutes and 3 seconds at 7 points of 9 months 1 day in 2018;

the use behavior of B_id 2 is that a user with U_id 1 uses teaching materials with M_id 198 in 2018, 9, 1, 7, 8 minutes and 21 seconds;

the use behavior of the B_id 406576 is that a user with the U_id 269 browses teaching materials with the M_id 1376 in a sharing mode when the point of 23 minutes and 13 seconds in 2019 8, 30 and 23;

the teaching material scoring behavior data comprises scoring behavior id, a user, teaching materials, scoring scores and behavior Time, and can be represented by S= (S_id, U, M, S_score, time);

The value range of the grading index is 0, 5;

table 5 is a partial example of a scoring behavior data acquisition result of a user on a teaching material, where the scoring behavior data is 107613 scores;

table 5 teaching material scoring behavioral data acquisition results (partial) example

Wherein, the scoring behavior of s_id 1 is that the user with u_id 1 scores teaching materials with m_id 18 for 2 in 22 minutes and 20 seconds at 21 of 9/3/2018;

the scoring behavior of s_id 2 is that a user with u_id 1 scores education material with m_id 1958 for 5 at 7 minutes 23 seconds at 2018, 9, 18, 14;

the scoring behavior of s_id 107613 is that a user with u_id 2239 scores teaching material with m_id 18723 for 4.5 at 2019, 8, 23, 58 minutes and 14 seconds;

the teaching material comment behavior data comprises evaluation behavior id, users, teaching materials, comment content and behavior Time, and can be represented by a representation C= (C_id, U, M, C_comment, time);

table 6 is a partial example of a comment behavior data collection result of a user on a teaching material, where the comment behavior data is 252123 pieces;

table 6 comment on Material behavior data acquisition results (partial) example

C_id	U	M	C_comment	Time
					1	U_id＝1765	M_id＝1	Is very helpful in	2018-09-01 14：12：25
2	U_id＝8872	M_id＝1	It is not clear	2018-09-02 12：17：03
					...	...	...	...	...
252123	U_id＝22	M_id＝91121	Support for	2019-07-31 14：38：04

Wherein, the comment behavior with C_id of 1 is that the user with U_id of 1765 is helpful to comment on the teaching material with M_id of 1 in 2018, 9, 1, 14 days, 12 minutes and 20 seconds;

the comment behavior of C_id 2 is that the comment of the user with U_id 8872 on the teaching material with M_id 1 is not clear at 17 minutes and 3 seconds at 2018, 9, 2 and 12;

the comment behavior with C_id of 252123 is that a user with U_id of 22 comments on teaching materials with M_id of 91121 in 2019, 7, 31, 14 minutes and 4 seconds;

the intermediate variables related to the values of the characteristic variables of the attribute of the processing capability of the teaching materials of each user comprise the number of topics Num_of_topic of the teaching platform, a teaching resource use mode U_pattern, a teaching resource used type Q_utype, and the total number N_all of the teaching materials uploaded by the user i _i Number of picture teaching materials n_picture _i Number of audio teaching materials N_Audio _i Number of video teaching materials N_video _i Animation teaching material quantity N_animation _i Number of uses of teaching material N_use _i Number of processing types N_Process of teaching materials _i Number of topics of teaching materials N_topic _i Total number of users u_use of teaching material used by others _i Total number of users, u_comment, of teaching material being reviewed _i User i uploads usage U_each of teaching material n _i,n Is used by oneself with U_test _i,n U_secret used by students _i,n Shared quantity Q_share_each _i,n Browsed quantity Q_diffuse_each by sharing links _i,n Browsed quantity q_browse_each _i,n Collected quantity q_collection_each _i,n Downloaded amount q_download_each _i,n Score Q_score_each _i,n Forward emotion comment Q_emotion_each _i,n User i uploads the teaching material N_time in each time period T within the time period T _i,t ；

The topic number num_of_topic of the teaching platform is obtained through multidimensional scale analysis of a teaching material label network, and the value is 20 in the embodiment of the invention;

the teaching material label network is an undirected network and can be represented by Gl= (L, el), wherein L represents all labels, and El represents a collinear relation among the labels;

the teaching resource use mode U_pattern is a result of K-means clustering based on average use amount, maximum use amount, total sub use amount and total student use amount of a user, wherein the selected K value is 4;

the teaching resource used type Q_utype is a result of K-means clustering based on average sharing amount, average transmission amount, average collection amount, maximum collection amount, average downloading amount and maximum downloading amount of users, wherein the selected K value is 4;

The user i uploads the total number N_all of the teaching materials _i The calculation formula of (2) is as follows: n_all _i ＝|{B|B_action＝b_upload,U＝i}|；

The user i uploads the number N_picture of the picture teaching materials _i The calculation formula of (2) is as follows: n_picture _i ＝|{B|B_action＝b_upload,U＝i，M_format＝m_picture}|；

The user i uploads the number N_audio of the audio teaching materials _i The calculation formula of (2) is as follows: n_audio _i ＝|{B|B_action＝b_upload,U＝i，M_format＝m_audio}|；

The user i uploads the number N_video of the video teaching materials _i The calculation formula of (2) is as follows: n_video _i ＝|{B|B_action＝b_upload,U＝i，M_format＝m_video}|；

The number N_animation of the transmission picture teaching materials on the user i _i The calculation formula of (2) is as follows: n_animation _i ＝|{B|B_action＝b_upload,U＝i，M_format＝m_animation}|；

The number of uses N_use of the user i for uploading the teaching materials _i The calculation formula of (2) is as follows: n_use _i ＝|{M_use|B_action＝b_upload,U＝i}|；

The number of processing types N_process of the user i uploading the teaching materials _i The calculation formula of (2) is as follows: N_Process _i ＝|{M_process|B_action＝b_upload,U＝i}|；

The user i uploads the topic number N_topic of the teaching materials _i Is determined according to the number of tag topics belonging to the determined platform topics;

the user i uploads the total number of users U_use of the teaching materials used by others _i The method is obtained by using the relative centrality of the network for teaching platform users;

the user uses a directed network, which can be represented by gu= (U, eu), wherein U represents all users, eu represents that user i uses teaching resources of user j;

The user i uploads the total number U_comment of the users with the commented on the education materials _i The method is obtained by commenting the relative centrality of the network to the teaching platform user;

the user comment network is a directed network and can be represented by Gc= (U, ec), wherein U represents all users, ec represents teaching resources of user j comment of user i;

the user i uploads the usage U_each of the teaching material n _i,n The calculation formula of (2) is as follows: u_each _i,n = |{ b|b_action=b_use, m=n } |, where n= { m|b_action=b_upload, u=i };

the user i uploads the self-used amount U_teach of the teaching material n _i,n The calculation formula of (2) is as follows: u_teach _i,n = |{ b|b_action=b_use, m=n, u=i } |, where n= { m|b_action=b_upload, u=i };

the user i uploads the student usage U_secret of the education material n _i,n The calculation formula of (2) is as follows: u_sea _i,n = |{ b|b_action=b_use, m=n, u_type=u_student } |, where n= { m|b_action=b_upload, u=i };

the user i uploads the shared quantity Q_share_each of the learning material n _i,n The calculation formula of (2) is as follows: q_share_each _i,n = |{ b|b_action=b_share, m=n } |, where n= { m|b_action=b_upload, u=i };

the user i uploads the browsed quantity Q_diffuse_each of the learning material n through the shared link _i,n The calculation formula of (2) is as follows: q_diffuse_each _i,n = |{ b|b_action=b_use, b_source=b_shared, m=n } |, where n= { m|b_action=b_upload, u=i };

the user i uploads the browsed quantity Q_browse_each of the learning material n _i,n The calculation formula of (2) is as follows: q_browse_each _i,n = |{ b|b_action=b_browse, m=n } |, where n= { m|b_action=b_upload, u=i };

the user i uploadsCollectable quantity Q_collection_each of teaching material n _i,n The calculation formula of (2) is as follows: q_collect_each _i,n = |{ b|b_action=b_collection, m=n } |, where n= { m|b_action=b_upload, u=i };

the downloaded quantity Q_download_each of the user i uploading the learning material n _i,n The calculation formula of (2) is as follows: q_download_each _i,n = |{ b|b_action=b_download, m=n } |, where n= { m|b_action=b_upload, u=i };

the user i uploads the score Q_score_each of the learning material n _i,n The calculation formula of (2) is as follows: q_score_each _i,n = { b_weight|b_action=b_score, m=n }, where n= { m|b_action=b_upload, u=i };

the user i uploads comment emotion tendencies Q_project_each of the education material n _i,n According to emotion tendency analysis in natural language processing, when comment emotion tendency analysis is positive, representing that the comment emotion belongs to forward emotion and is counted as 1;

The user i uploads the teaching material N_time in each time period T within the time period T _i,t The calculation formula of (2) is as follows: n_time _i,t ＝|{B|B_action＝b_upload,B_time∈t,U＝i}|；

Table 7 and table 8 are examples of the processing and processing capability attribute and the value part of the attribute feature variable of the user teaching material provided in the embodiment of the present invention, where table 7 is the overall value of the user uploaded teaching material, and table 8 is the specific value of each teaching material uploaded by the user;

TABLE 7 user upload of Whole valued (partial) example of educational material

For the user 1, the total number of the uploaded teaching materials is 139, the number of the picture teaching materials is 110, the number of the audio teaching materials is 4, the number of the video teaching materials is 16, the animation teaching materials are not provided, the number of the purposes of the teaching materials is 2, the number of the processing types of the teaching materials is 2, the number of the topics of the teaching materials is 1, the teaching materials are used by 20 other users and reviewed by 12 users, and the uploaded teaching materials are 10,46,0,0,1,0,14,42,22,4,0,0 in each month within 12 months;

table 8 specific value (partial) examples of each teaching material uploaded by the user

/>

For the user 1, the usage amount of the uploaded education material 1 is 3, the self usage amount is 3, the student usage amount is 0, the shared amount is 3, the browsed amount is not recorded through the shared link, the browsed amount is not recorded, the collection amount is 1, the downloaded amount is 30, the score is 4, and the number of forward emotion comments is 5; the usage amount of the uploaded teaching material 139 is 100, the self usage amount is 2, the student usage amount is 80, the shared amount is 1, the browsed amount is not recorded through the shared link, the browsed amount is not recorded, the collection amount is 0, the downloaded amount is 63, the score is 4, and the number of forward emotion comments is 2;

The teaching material processing capacity matrix X for forming each user i _i I.e. X _i ＝(R_picture _i ，R_audio _i ，R_video _i ，D_use _i ，D_process _i ，D_topic _i ，U_average _i ，U_max _i ，U_self _i ，U_student _i ，U_parrten _i ，Q_share _i ，Q_collect _i ，Q_mcollect _i ，Q_download _i ，Q_mdownload _i ，Q_score _i ，Q_udegree _i ，Q_utype _i ，Q_emotion _i ，Q_cdegree _i ，T_fre _i ，T_vol _i }；

Taking teacher 1 as an example, a teacher 1 teaching material processing and processing capacity evaluation matrix X is described ₁ Is a value of (2);

the picture richness of the teacher 1 is as follows: r_picture ₁ ＝log ₁₀ (N_picture ₁ )＝log ₁₀ (119)＝2.08；

The audio richness of teacher 1 takes the value: r_audio ₁ ＝log ₁₀ (N_audio ₁ )＝log ₁₀ (4)＝0.60；

The video richness of teacher 1 takes the values: r_video ₁ ＝log ₁₀ (N_video ₁ )＝log ₁₀ (16)＝1.20；

The purpose diversity value of the teacher 1 is as follows:

the processing type diversity value of the teacher 1 is as follows:

the theme diversity of teacher 1 takes the value:

the average usage value of teacher 1 is:

the maximum usage of teacher 1 takes the value of:

the total self-use amount of teacher 1 takes the value:

the total amount of student usage of teacher 1 is:

the use mode of the teacher 1 belongs to the first class according to the clustering result;

the average sharing value of the teacher 1 is:

the average collection value of the teacher 1 is:

the maximum collection value of the teacher 1 is:

the average download value of the teacher 1 is:

the maximum download amount of the teacher 1 takes the value of:

/>

the average score value of teacher 1 is:

the value of the center of the teacher 1 to be used is 0.035;

the used category of the teacher 1 belongs to a third category according to the clustering result;

the emotional tendency of the article theory of the teacher 1 is as follows:

the comment centrality value of the teacher 1 is 0.003;

The update frequency of teacher 1 takes the value:

the volatility of teacher 1 takes the value:

in the embodiment of the invention, the multi-dimensional input matrix X of the teacher 1 ₁ ＝(2.08,0.60,1.20,0.67,0.50,0.05,1.93,2.00,1.30,2.33,1,0.07,0.06,0.00,8.94,1.80,3.98,0.035,3,0.51,0.003,11.58,0.14)；

Step S3, selecting a user set, acquiring a teaching material processing energy moment array set corresponding to the user set, and further acquiring a capability label set of the user set with manual labeling, wherein the method specifically comprises the steps.

Step S31, selecting the user set according to the region where the user is located, the type of school, the taught school and the dimension of the taught department, and marking the user set as U_teacher, marking the number of the user set as NU, and obtaining a corresponding teaching material processing capacity matrix X of each user in the user set _i Form a corresponding teaching material processing energy moment array set, which is marked as X, X= (X) ₁ ,X ₂ ,...,X _i ,...,X _NU ) ^T Wherein X is _i ∈U_teacher；

The specific user set u_teacher constructed in the embodiment of the present invention is the administrative users in all regions, all school types and all school segments in Z provinces, and the number nu=5023;

in the embodiment of the invention, the teaching material processing and processing capacity of the user set U_teacher evaluates the input data set X _{u_teacher} Matrix X for 5023 administrative users in all regions, all school types and all school segments of Z provinces _i Is a comprehensive matrix of (a), i.e

Step S32, obtaining a capability label set of a user set U_teacher marked by people and marking the capability label set as Y _{u_teacher} ，Y _{u_teacher} ＝(Y ₁ ,Y ₂ ,...,Y _i ,...,Y _NU ) ^T Wherein Y is _i Capability labels for each user, Y _i ∈U_teacher；

table 9 shows an example of a user materials processing and handling capability value tag (section) in an example of the present invention. Where the critical value e=20.

TABLE 9 user Material processing and handling capability value tag (part) example

Self-evaluation score St of teacher 1 ₁ Expert evaluation score Se =100 ₁ =100, teacher 1 material processing and processing final score Y ₁ = (100+100)/2=100; self-evaluation score St of teacher 2 ₂ =75, expert evaluation score Se ₂ ＝100，e ₂ ＝|St ₂ -Se ₂ |＝|75-100|＝25>E=20, evaluation score St of expert 2 ₂ =80, while |80-100|>80-75, so teacher 2 processes and processes the material to a final score Y ₂ = (80+75)/2=77.5; self-evaluation of teacher 3St dividing ₃ Expert evaluation score Se =100 ₃ =90, teacher 3 material processing and processing final score Y ₃ = (100+90)/2=95; end user material processing and handling capability score matrix y= (100,77.5,..95) ^T ；；

And S4, constructing a regression model based on a plurality of machine learning methods, wherein the regression model is used for processing the capability matrix output identification capability labels according to the input teaching materials, training the regression model by utilizing the teaching materials processing capability matrix set and the capability label set, and determining an optimal regression model.

The multiple regression models comprise a multiple linear regression model, a random forest regression model, a support vector machine regression model and a BP neural network regression model;

the multiple linear regression model is to fit a linear regression model by minimizing the sum of squares of residual errors between the value labels of sample users and the predicted values of the linear model, and the calculation formula of the value labels is as follows:

wherein Y is a value tag, C is a constant, R_picture is a picture richness variable, R_audio is an audio richness feature variable, R_video is a video richness feature variable, R_animation is an animation richness feature variable, D_use is a usage diversity feature variable, D_process is a processing type diversity feature variable, D_pic is a theme diversity feature variable, U_average is an average usage feature variable, U_max is a maximum usage feature variable, U_self is a self-usage total feature variable, U_student is a usage total feature variable, U_pattern is a usage Pattern feature variable, Q_share is an average shared feature variable, Q_diffuse is an average propagation feature variable, Q_diff_rate is a propagation feature variable, Q_colour is an average storage feature variable, Q_mcollet is a maximum storage feature variable, Q_download is an average usage feature variable, U_student is a student usage total feature variable, U_Path is a usage Pattern feature variable, Q_share is a maximum feature, Q_download is a receiver feature variable, and Q_download is a receiver feature The rating feature variable, Q_score, is the average scoring feature variable, Q_uderre is the used centrality feature variable, Q_utype is the used category feature variable, Q_emotion is the comment emotion tendencies feature variable, Q_cdegre is the comment centrality feature variable, T_fre is the update frequency feature variable, T_vol is the volatility feature variable,

wherein Y is a value tag that is used to identify the value,

As characteristic variable x _i Is of transposed form->

Is a kernel function, satisfy->

b is a constant;

the BP neural network regression model is a three-layer neural network with an input layer, a hidden layer and an output layer, each layer is composed of a plurality of neurons, wherein the input layer is 27 in total, the number of the hidden layers is 9, the number of the output layer is 1 value label, and the regression of the value label is realized through the full connection of the neurons;

Dividing sample data formed by the teaching material processing capacity moment array set and the capacity label set into k groups, extracting 1 group of teachers from the sample data as a test set each time, taking the remaining k-1 groups of teachers as a training set, and gradually training the regression model for k times, wherein k=10 is set in the embodiment of the invention;

the trained evaluation effect value is the average absolute percentage error value of the regression model, which is marked as MAPE, and the calculation mode is as follows:

s5, comparing the evaluation effects of different regression models, determining the regression model with the minimum MAPE value as the optimal regression model, and carrying out dynamic identification on the processing capacity of the user teaching materials.

The average MAPE value of the four regression models in the embodiment of the invention is 10.76%, and the effectiveness of the regression model is automatically identified based on the processing and processing capabilities of the user teaching materials based on multi-source data fusion is confirmed integrally, wherein the regression model is based on a multiple linear regression model L ₁ The loss function MAPE value of (2) is only 5.29%, and the final selection is based on a multiple linear regression model L ₁ Automatically identifying a regression model for the optimal regression model, namely the processing and processing capacity of the final characteristic user teaching materials;

Optimal regression model L ₁ Picture richness R_picture, audio richness R_audio, video richness R_video, usage diversity D_use, theme diversity D_topic, processing type diversity D_process, average usage U_average, maximum usage U_maxFrom usage U_self, student usage U_student, usage U_parameter, usage pattern U_parameter, average share Q_share, average share Q_collect, maximum share Q_mcollect, average download Q_download, maximum download Q_mdowload, average score Q_score, used centrality Q_udetree, used category Q_utype, comment emotion trend Q_score, comment centrality Q_cdegreee, update frequency T_fre, volatility T_vol as independent variable, and stepwise regression analysis is performed using user teaching and handling capacity value tags as dependent variable, through model automatic identification, finally remaining picture richness, usage diversity, average share, maximum download, total 7 items in model, R direction value is 0.716, meaning picture richness, usage diversity, average share, average download, maximum share average score, and final fluctuation cause of 71. And the model passed F test (f=34.208, p=0.000 <0.05 The model is valid. In addition, the multiple collinearity of the model is checked, and the VIF values in the model are all smaller than 5, which means that the problem of collinearity does not exist; and the D-W value (D-w=2.016) is near the number 2, so that the model has no autocorrelation, no association relationship between sample data exists, and the model is good. Table 10 shows a stepwise regression model L according to an embodiment of the present invention ₁ Specific results of (3).

TABLE 10 stepwise regression model L of the invention ₁ Specific results of (3).

The final regression equation is: final score Y _{U_teacher} Picture richness r_picture+26.389 usage diversity d_use+16.463 average usage u_average-1.153 average shared q_share+19.064 average collection q_collect+4.927 maximum downloading q_mdown-35.233 volatility t_vol;

taking a math male teacher 10626 of a primary school in a non-sample Z province as an example, the automatic test result of the processing and processing ability evaluation result of the teaching materials is described;

picture richness r_picture of teacher 10626 ₁₀₆₂₆ 1.08, usage diversity D_use ₁₀₆₂₆ 1.00, average usage U_average ₁₀₆₂₆ An average share Q_share of 0.54 ₁₀₆₂₆ 3.93, average collection Q_collect ₁₀₆₂₆ 0.00, maximum download amount Q_mdown ₁₀₆₂₆ 0.49, volatility T_vol ₁₀₆₂₆ 0.31;

based on the capacity assessment model L _{u_teacher} The teaching material processing and handling capacity score of user 10626 was automatically calculated to be 71.14.

And S6, dynamically identifying the processing capacity of the user teaching materials by using the trained optimal regression model.

Collecting user update data at time t;

dynamically updating the capability label of the user based on the user updating data and the trained optimal regression model;

the multisource procedural data of the teaching material processing and processing capability of the teacher 10626 is updated by taking half a year as an update period C, and the picture richness R_picture of the teacher 10626 is obtained at 29 days of 2 months 2020 ₁₀₆₂₆ 1.28, usage diversity D_use ₁₀₆₂₆ 1.00, average usage U_average ₁₀₆₂₆ An average share Q_share of 0.64 ₁₀₆₂₆ 3.93, average collection Q_collect ₁₀₆₂₆ 0.00, maximum download amount Q_mdown ₁₀₆₂₆ 0.49, volatility T_vol ₁₀₆₂₆ At 0.37, the final score is R_picture, which is rich in pictures by teacher 10626 at 30/8/2020 ₁₀₆₂₆ 2.68, usage diversity D_use ₁₀₆₂₆ 1.00, average usage U_average ₁₀₆₂₆ Average share q_share of 0.81 ₁₀₆₂₆ 3.23, average collection Q_collect ₁₀₆₂₆ 0.01, maximum download amount Q_mdown ₁₀₆₂₆ 0.49, volatility T_vol ₁₀₆₂₆ 0.38, a final score of 84.81;

the embodiment of the invention discloses an automatic recognition system for processing and processing capabilities of user teaching materials based on multi-source data fusion, which comprises the following steps:

The implementation principle and technical effect of the system are similar to those of the method, and are not repeated here.

It should be noted that, in any of the above embodiments, the methods are not necessarily sequentially executed in the sequence number, and it is meant that the methods may be executed in any other possible sequence, as long as it cannot be inferred from the execution logic that the methods are necessarily executed in a certain sequence.

It will be readily appreciated by those skilled in the art that the foregoing description is merely a preferred embodiment of the invention and is not intended to limit the invention, but any modifications, equivalents, improvements or alternatives falling within the spirit and principles of the invention are intended to be included within the scope of the invention.

Claims

1. A method for automatically identifying processing capacity of a user teaching material based on multi-source data fusion is applied to a teaching platform supporting processing or management of the teaching material, and is characterized by comprising the following steps:

s1, defining attributes of processing capacity of teaching materials of users in advance, wherein each attribute comprises a characteristic variable; the attribute of the processing capability of the teaching materials comprises richness, diversity, availability, usefulness and timeliness;

wherein the richness comprises 4 feature variables of picture richness, audio richness, video richness and animation richness;

The timeliness comprises 2 characteristic variables of update frequency and volatility;

2. The automatic recognition method for processing and processing capabilities of user teaching materials based on multi-source data fusion according to claim 1, wherein the user data comprises user basic data, teaching material label data, teaching material use behavior data, teaching material scoring behavior data and teaching material comment behavior data;

3. The automatic recognition method for processing capacity of user teaching materials based on multi-source data fusion according to claim 1, wherein the behavior dimension analysis method comprises descriptive statistical analysis and K-means cluster analysis, and is mainly used for calculating picture richness, audio richness, video richness, animation richness, usage diversity, processing type diversity, average usage, maximum usage, self-usage, student usage, usage pattern, average sharing, average propagation, propagation rate, average collection, maximum collection, average downloading amount, maximum downloading amount, approval rate, average score, used category, update frequency and fluctuation characteristic variable;

4. The automatic recognition method for processing and processing capabilities of user teaching materials based on multi-source data fusion according to claim 1, wherein the step S3 comprises the following steps:

5. the automatic recognition method for processing capacity of user teaching materials based on multi-source data fusion according to claim 1, wherein the regression models comprise a multiple linear regression model, a random forest regression model, a support vector machine regression model and a BP neural network regression model;

wherein Y is a value tag that is used to identify the value,

As characteristic variable x _i Is of transposed form->

Is a kernel function, satisfy->

b is a constant;

6. The automatic recognition method for processing and processing capabilities of user teaching materials based on multi-source data fusion according to claim 1, wherein the step S4 comprises the following steps:

7. The automatic recognition method for processing and processing capabilities of user teaching materials based on multi-source data fusion as set forth in claim 1, further comprising the step of S6:

collecting user update data at time t;

8. The automatic recognition system for processing and processing capabilities of the user teaching materials based on multi-source data fusion is applied to a teaching platform supporting processing and management of the teaching materials, and is characterized by comprising the following components:

the predefining module is used for predefining attributes of processing capacity of the user teaching materials and feature variables contained in each attribute; the attribute of the processing capability of the teaching materials comprises richness, diversity, availability, usefulness and timeliness;