CN112699933A - Automatic identification method and system for processing capacity of user teaching material - Google Patents

Automatic identification method and system for processing capacity of user teaching material Download PDF

Info

Publication number
CN112699933A
CN112699933A CN202011583583.7A CN202011583583A CN112699933A CN 112699933 A CN112699933 A CN 112699933A CN 202011583583 A CN202011583583 A CN 202011583583A CN 112699933 A CN112699933 A CN 112699933A
Authority
CN
China
Prior art keywords
user
teaching
teaching material
average
characteristic variable
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202011583583.7A
Other languages
Chinese (zh)
Other versions
CN112699933B (en
Inventor
吴砥
陈敏
李亚婷
徐建
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Central China Normal University
Original Assignee
Central China Normal University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Central China Normal University filed Critical Central China Normal University
Priority to CN202011583583.7A priority Critical patent/CN112699933B/en
Publication of CN112699933A publication Critical patent/CN112699933A/en
Application granted granted Critical
Publication of CN112699933B publication Critical patent/CN112699933B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/30Computing systems specially adapted for manufacturing

Landscapes

  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Electrically Operated Instructional Devices (AREA)

Abstract

The invention discloses a method and a system for automatically identifying the processing capacity of a user teaching material based on multi-source data fusion. The method comprises the following steps: s1, predefining attributes of the processing capacity of the user teaching materials and characteristic variables contained in each attribute, wherein the attributes comprise richness, diversity, usability, usefulness and timeliness; s2, collecting user data from the teaching platform, and calculating a teaching material processing capacity matrix of each user according to the user data; s3, acquiring sample data of the user set; s4, constructing a regression model based on various machine learning methods, training the regression model by using sample data, and determining an optimal regression model; and S5, dynamically identifying the processing capacity of the user teaching materials by using the trained optimal regression model. The invention can realize the intelligent automatic identification of the processing and handling capacity of the user teaching materials.

Description

Automatic identification method and system for processing capacity of user teaching material
Technical Field
The invention belongs to the field of education informatization, and particularly relates to a method and a system for automatically identifying processing capacity of a user teaching material based on multi-source data fusion.
Background
With the development of computer technology, teaching platforms for various types of auxiliary teaching become important information carriers in teaching, and the teaching platforms include, but are not limited to, a regional education resource public service platform, an online teaching platform, a network research and repair platform, an online training platform, an education management platform and the like. In teaching based on a teaching platform, processing of teaching materials and recognition of processing capabilities of users such as users are very important contents.
At present, the processing and processing capacity of the user teaching materials is still identified in the form of questionnaires, for example, the user carries out self-evaluation through scales or test questions, only the current state of the user is concerned, the investigation process has certain subjectivity and needs high cooperation, meanwhile, the consideration of the processing and processing process data of the objective teaching materials of the user is ignored, and the problems of inaccurate identification, low identification efficiency and low data utilization rate exist. How to utilize the computer technology and realize more objective, more accurate and more continuous intelligent automatic identification based on the user data of the user on the teaching platform is a very important problem. There is no mature computer-based automatic identification technology in the prior art.
Disclosure of Invention
Aiming at least one defect or improvement requirement in the prior art, the invention provides a method and a system for automatically identifying the processing capacity of a user teaching material based on multi-source data fusion, which can realize the intelligent automatic identification of the processing capacity and the processing capacity of the user teaching material.
In order to achieve the above object, according to a first aspect of the present invention, there is provided a method for automatically identifying processing capability of a user teaching material based on multi-source data fusion, which is applied to a teaching platform supporting processing or management of the teaching material, and includes the steps of:
s1, predefining attributes of the processing capacity of the user teaching materials and characteristic variables contained in each attribute;
s2, collecting user data from the teaching platform, performing multi-source data fusion according to the user data by an analysis method based on behaviors, contents and social dimensions, and determining the characteristic variable values of the attributes of the teaching material processing capacity of each user, wherein the unitary array formed by the values of all the characteristic variables of all the attributes of the teaching material processing capacity of each user forms a teaching material processing capacity matrix of each user;
s3, selecting a user set, acquiring a teaching material processing capacity matrix set corresponding to the user set, and acquiring a manually labeled capacity label set of the user set;
s4, constructing multiple regression models based on multiple machine learning methods, wherein the regression models are used for outputting recognized capability labels according to input teaching material processing capability matrixes, training the regression models by using the teaching material processing capability matrix set and the capability label set, and determining an optimal regression model;
and S5, dynamically identifying the processing capacity of the user teaching materials by using the trained optimal regression model.
Preferably, the attributes of the processing capability of the teaching materials comprise richness, diversity, usability, usefulness and timeliness;
the richness is used for expressing quantity distribution characteristics of teaching materials in different file formats;
the diversity is used for representing the distribution characteristics of the purpose and the processing type of the teaching materials;
the usability is used for representing the use characteristics of the uploader of the teaching materials on the teaching materials;
the usefulness is used for representing the recognition characteristics of the teaching materials by other people except the uploader of the teaching materials;
the timeliness is used for representing the fluctuation characteristics of the updating frequency of the teaching materials.
Preferably, the richness comprises 4 characteristic variables of picture richness, audio richness, video richness and animation richness;
the diversity comprises 3 characteristic variables of use diversity, processing type diversity and theme diversity;
the availability comprises 5 characteristic variables of average usage, maximum usage, total self-usage, total student usage and usage pattern;
the usefulness comprises 13 characteristic variables of average share quantity, average spread quantity, spread rate, average collection quantity, maximum collection quantity, average download quantity, maximum download quantity, recognition rate, average score, used centrality, used category, comment emotional tendency and comment centrality;
the timeliness comprises 2 characteristic variables of updating frequency and volatility.
Preferably, the user data comprises user basic data, teaching material label data, teaching material use behavior data, teaching material grading behavior data and teaching material comment behavior data;
the user basic data comprises a user id, a user name, a user role, a user gender, a user age, a located area, a school type, a section to be taught and a subject to be taught;
the teaching material basic data comprises a teaching material id, a teaching material name, a material form, a material purpose and a processing type;
the teaching material label data comprises a teaching material id, a label name and a label weight;
the teaching material use behavior data comprises use behavior id, users, use behavior actions, teaching materials, behavior time and behavior sources;
the teaching material grading behavior data comprise grading behavior id, users, teaching materials, grading score and behavior time;
the teaching material comment behavior data comprise evaluation behavior id, users, teaching materials, comment contents and behavior time.
Preferably, the behavior-based dimension analysis method comprises descriptive statistical analysis and K-means cluster analysis, and is mainly used for calculating picture richness, audio richness, video richness, animation richness, use diversity, processing type diversity, average usage, maximum usage, self-usage total, student usage total, usage pattern, average sharing amount, average transmission amount, transmission rate, average collection amount, maximum collection amount, average downloading amount, maximum downloading amount, approval rate, average score, used category, update frequency and volatility characteristic variables;
the content-based dimension analysis method comprises multi-dimensional scale analysis and emotional tendency analysis, and is mainly used for calculating the diversity of themes and the characteristic variables of comment emotional tendency;
the social dimension analysis-based method comprises social network analysis and is mainly used for calculating central characteristic variables of used comments.
Preferably, the S3 includes the steps of:
s31, selecting the user set according to the area where the user is located, the school type, the section being taught and the subject dimension being taught, recording the user set as U _ teacher, recording the number of the user set as NU, and acquiring a teaching material processing capacity matrix X corresponding to each user in the user setiForming a corresponding teaching material processing capability matrix set, which is marked as X, X ═ X1,X2,...,Xi,...,XNU)TWherein X isi∈U_teacher;
S32, acquiring a manually labeled user set U _ teacher capability label set, and recording as Yu_teacher,Yu_teacher=(Y1,Y2,...,Yi,...,YNU)TWherein Y isiCapability tag for each user, Yi∈U_teacher;
Capability label YiIs determined according to the user self-labeling data and the expert labeling data, firstly, the user self-labeling data St is calculatediAnd first expert annotation data SeiError value e ofi=|Sti-SeiIf eiLess than a predetermined threshold E, a capability label YiDetermined by averaging the two, if eiIf the value is larger than the set critical value E, the second expert marking data Sa is obtainediRespectively countCalculating SaiTo Sti、SeiDistance, capability label YiFrom SaiAnd the average value of the scores with smaller distance to the average value is determined, and the calculation formula is as follows:
Figure BDA0002864911810000041
preferably, the multiple regression models comprise a multiple linear regression model, a random forest regression model, a support vector machine regression model and a BP neural network regression model;
the multiple linear regression model is a linear regression model fitted by minimizing the sum of squared residuals between the value labels of the sample users and the predicted values of the linear model, and the calculation formula of the value labels is as follows:
Figure BDA0002864911810000042
wherein Y is a value tag, C is a constant, R _ picture is a picture richness variable, R _ audio is an audio richness characteristic variable, R _ video is a video richness characteristic variable, R _ animation is an animation richness characteristic variable, D _ use is a use diversity characteristic variable, D _ process is a processing type diversity characteristic variable, D _ topic is a subject diversity characteristic variable, U _ average is an average use characteristic variable, U _ max is a maximum use characteristic variable, U _ self is an independent use total amount characteristic variable, U _ student use is a student use total amount characteristic variable, U _ Pattern is a use mode characteristic variable, Q _ share is an average share characteristic variable, Q _ difference is an average propagate characteristic variable, Q _ difference _ rate is a propagate characteristic variable, Q _ count is an average collection amount characteristic variable, Q _ mcort is a maximum collection amount characteristic variable, and Q _ download is an average download amount characteristic variable, q _ mdownload is a maximum download quantity characteristic variable, Q _ recognition is an acceptance rate characteristic variable, Q _ score is an average grading characteristic variable, Q _ udegree is a used central characteristic variable, Q _ utype is a used category characteristic variable, Q _ emotion is a comment emotional tendency characteristic variable, Q _ cdegree is a comment central characteristic variable, and T _ fre is update frequency characteristic variableA characteristic variable, T _ vol is a volatility characteristic variable,
Figure BDA0002864911810000051
and ω1~ω26Epsilon is the error for the weight coefficient obtained by training;
the random forest regression model is an algorithm model using a CART decision tree as a weak learner and randomly selecting features, T weak learners are independently trained through T-time acquisition, and the final result is obtained by calculating the regression results of the T weak learners by adopting a weighted average method;
the support vector machine regression model is used for mapping an input teaching material processing capacity matrix into a high-dimensional feature space through a kernel function to realize regression calculation of a value label, and the calculation formula of the value label is as follows:
Figure BDA0002864911810000052
wherein Y is a value tag, wherein,
Figure BDA0002864911810000053
and alphaiIs Lagrange coefficient, x is the characteristic variable of the processing attribute of the input user teaching material,
Figure BDA0002864911810000054
is a characteristic variable xiIn the transposed form of (a) to (b),
Figure BDA0002864911810000055
is a kernel function, satisfies
Figure BDA0002864911810000056
b is a constant;
the BP neural network regression model is a three-layer neural network with an input layer, a hidden layer and an output layer, each layer is composed of a plurality of neurons, the input layer is 27 feature variables of processing attributes of user teaching materials, the number of the hidden layers is 9, the output layer is 1 value tag, and regression of the value tags is achieved through full connection of the neurons.
Preferably, the S4 includes the steps of:
dividing sample data formed by the teaching material processing capacity matrix set and the capacity label set into k groups, extracting 1 group of teachers from the k groups of sample data as a test set each time, taking the remaining k-1 groups of teachers as a training set, and gradually training the regression model for k times;
the trained evaluation effect value is the mean absolute percentage error value of the regression model, and is marked as MAPE, and the calculation mode is as follows:
Figure BDA0002864911810000061
wherein M is the number of users, y ', corresponding to the test set sample'jIs a predicted value of the teacher's j ability label, yjThe actual value of the teacher j capability label;
and comparing the evaluation effects of different regression models, and determining the regression model with the minimum MAPE value as the optimal regression model.
Preferably, the method further comprises step S6:
collecting user update data at time t;
and dynamically updating the capability labels of the users based on the user updating data and the trained optimal regression model.
According to a second aspect of the present invention, there is provided a system for automatically identifying processing capability of a user teaching material based on multi-source data fusion, which is applied to a teaching platform supporting processing or management of the teaching material, and comprises:
the pre-defining module is used for pre-defining attributes of processing capacity of the user teaching materials and characteristic variables contained in each attribute;
the data acquisition module is used for acquiring user data from the teaching platform, performing multi-source data fusion from behaviors, contents and social dimensions according to the user data to determine the value of the variable of the attribute of the teaching material processing capacity of each user, and forming a unitary array consisting of the values of all characteristic variables of all attributes of the teaching material processing capacity of each user into a teaching material processing capacity matrix of each user;
the sample acquisition module is used for setting a screening condition to select a user set, acquiring a teaching material processing capacity matrix set corresponding to the user set and also acquiring a manually marked capacity label set of the user set;
the training module is used for constructing a regression model based on multiple types, the regression model is used for outputting recognized capability labels according to an input teaching material processing capability matrix, the regression model is trained by utilizing the teaching material processing capability matrix set and the capability label set, and an optimal regression model is determined;
and the recognition module is used for dynamically recognizing the processing capacity of the user teaching materials by utilizing the trained optimal regression model.
In summary, the advantages and positive effects of the invention are:
(1) the intelligent automatic identification method has the advantages that the intelligent automatic identification of the processing and processing capacity of the user teaching materials can be realized by fully utilizing the multi-dimensional process data in the education platform, the intelligent automatic identification method has the characteristics of being more objective, more accurate and more continuous, more time is spent only when the model is trained in advance, and the intelligent automatic identification method has the characteristics of high speed and high efficiency when the trained model is applied for identification.
(2) In addition, a time dimension is introduced, an automatic dynamic updating and identifying mode is supported, and large-scale and continuous evaluation work such as teaching material processing and processing capacity of a user, teacher information literacy and the like is facilitated.
(3) The accuracy of intelligent automatic identification can be further improved by optimizing the capability attribute/characteristic variable and the type of the collected data.
Drawings
Fig. 1 is a general flowchart of a dynamic evaluation method for processing and processing capabilities of user teaching materials based on multi-source data fusion according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention. In addition, the technical features involved in the embodiments of the present invention described below may be combined with each other as long as they do not conflict with each other.
Fig. 1 is a general flowchart of a method for automatically identifying processing capability of a user teaching material based on multi-source data fusion according to an embodiment of the present invention, where the method is applied to a teaching platform supporting processing or management of teaching materials, and the method includes the following steps:
and S1, predefining the attributes of the processing capacity of the user teaching materials and the characteristic variables contained in each attribute.
The attributes of the processing capacity of the teaching materials comprise richness, diversity, usability, usefulness and timeliness;
the richness is used for expressing quantity distribution characteristics of teaching materials in different file formats;
the diversity is used for representing the distribution characteristics of the purpose and the processing type of the teaching materials;
the usability is used for representing the use characteristics of the uploader of the teaching materials on the teaching materials;
the usefulness is used for representing the recognition characteristics of the teaching materials by other people except the uploader of the teaching materials;
the timeliness is used for representing the fluctuation characteristics of the updating frequency of the teaching materials;
the richness comprises 4 feature variables of picture richness R _ picture, audio richness R _ audio, video richness R _ video and animation richness R _ animation;
the picture richness R _ picture refers to a log function standardized numerical value of the number N _ picture of picture teaching materials uploaded by a user, and a picture richness calculation formula for any user i is as follows: r _ picturei=log10(N_picturei);
The audio richness R _ audio refers to log function of the quantity N _ audio of audio teaching materials uploaded by a userThe number is a normalized number, and the audio richness calculation formula for any user i is: r _ audioi=log10(N_audioi);
The video richness R _ video refers to a log function standardized numerical value of the number N _ video of video teaching materials uploaded by a user, and a video richness calculation formula for any user i is as follows: r _ videoi=log10(N_videoi);
The animation richness R _ animation refers to a log function standardized numerical value of the number N _ animation of animation teaching materials uploaded by a user, and the animation richness calculation formula for any user i is as follows: r _ animationi=log10(N_animationi);
The diversity comprises 3 characteristic variables of use diversity D _ use, processing type diversity D _ process and theme diversity D _ topic;
the application diversity D _ use refers to the proportion of the application number N _ use of the teaching materials uploaded by the user to the total application number Num _ of _ use of the teaching materials, and the application diversity calculation formula for any user i is as follows:
Figure BDA0002864911810000081
the processing type diversity D _ process refers to the proportion of the processing type quantity N _ process of the teaching material uploaded by the user to the total processing form Num _ of _ process of the teaching material, and the processing type diversity calculation formula for any user i is as follows:
Figure BDA0002864911810000091
the theme diversity refers to the proportion of the number N _ topic of the topics uploaded by the user to the total number Num _ of _ topics of the teaching material topics, and the calculation formula for comparing the theme diversity of the teaching material of any user i is as follows:
Figure BDA0002864911810000092
the availability comprises 5 characteristic variables of average usage U _ average, maximum usage U _ max, self-usage total U _ self, student usage total U _ student and usage mode U _ pattern;
the average usage U _ average refers to the ratio of the sum of the usage U _ each of all teaching materials uploaded by a user to the total number N _ all of the teaching materials uploaded by the user, and the calculation formula of the average usage of the teaching materials of any user i is as follows:
Figure BDA0002864911810000093
the maximum usage U _ max refers to a log function standardized value of the maximum value of the usage U _ each of each teaching material uploaded by a user, and a calculation formula of the maximum usage of the teaching material of any user i is as follows:
Figure BDA0002864911810000094
the self-use total amount U _ teacher refers to a log function standardized numerical value of the sum of the self-use amount U _ teach of each teaching material uploaded by a user, and a calculation formula of the self-use total amount of the teaching materials of any user i is as follows:
Figure BDA0002864911810000095
the total student usage amount U _ student refers to a log function standardized numerical value of the sum of the usage amounts U _ search of all teaching materials uploaded by a user, and the total student usage amount calculation formula for the teaching materials of any user i is as follows:
Figure BDA0002864911810000096
the use mode U _ pattern is a result of clustering use modes of teaching materials uploaded by all users of the teaching platform based on k-means;
the usefulness includes 13 characteristic variables of average share quantity Q _ share, average spread quantity Q _ divide, spread rate Q _ divide _ rate, average collection quantity Q _ collect, maximum collection quantity Q _ mcollect, average download quantity Q _ download, maximum download quantity Q _ mdowload, acceptance rate Q _ recognition, average score Q _ score, used centrality Q _ udegree, used category Q _ utype, comment sentiment tendency Q _ emotion and comment centrality Q _ cdegree;
the average sharing quantity Q _ share refers to the ratio of the sum of the sharing quantity Q _ share _ reach of each teaching material uploaded by a user and the total quantity N _ all of the teaching materials uploaded by the user, and the calculation formula of the average sharing quantity of the teaching materials of any user i is as follows:
Figure BDA0002864911810000101
the average propagation quantity Q _ share refers to the ratio of the sum of the browsed quantity Q _ difference _ each of all teaching materials uploaded by the user through the sharing link to the total quantity N _ all of the teaching materials uploaded by the user, and the calculation formula of the average propagation quantity of the teaching materials of any user i is as follows:
Figure BDA0002864911810000102
the propagation rate Q _ difference _ rate refers to a ratio of an average propagation amount Q _ difference of the teaching material uploaded by the user to an average sharing amount Q _ share of the teaching material uploaded by the user, and a calculation formula of the propagation rate of the teaching material of any user i is as follows:
Figure BDA0002864911810000103
the average collection quantity Q _ collect refers to the ratio of the sum of the collected quantities Q _ collect _ each of the teaching materials uploaded by the user to the total quantity N _ all of the teaching materials uploaded by the user, and the calculation formula of the average collection quantity of the teaching materials of any user i is as follows:
Figure BDA0002864911810000104
the maximum collection quantity Q _ mcollect refers to a log function standardized numerical value of the maximum value of the collection quantity Q _ collectionLeach of each teaching material uploaded by a user, and the calculation formula of the maximum collection quantity of the teaching materials of any user i is as follows:
Figure BDA0002864911810000105
the average download quantity Q _ download refers to the ratio of the sum of the downloaded quantity Q _ download _ each of all teaching materials uploaded by a user to the total quantity N _ all of the teaching materials uploaded by the user, and the calculation formula of the average download quantity of the teaching materials of any user i is as follows:
Figure BDA0002864911810000106
the maximum download quantity Q _ mdownload refers to a log function standardized numerical value of the maximum value of the download quantity Q _ download _ each of each teaching material uploaded by a user, and the maximum download quantity calculation formula of the teaching material of any user i is as follows:
Figure BDA0002864911810000111
the acceptance rate Q _ recognition refers to the ratio of the sum of the collected quantity Q _ collect _ reach and the downloaded quantity Q _ download _ reach of each teaching material uploaded by a user to the browsed quantity Q _ browse _ reach of each teaching material, and the calculation formula of the acceptance rate of the teaching materials for any user i is as follows:
Figure BDA0002864911810000112
the average score Q _ score refers to the ratio of the sum of all material scores Q _ score _ each uploaded by a user to the total number N _ all uploaded by the user, and the calculation formula of the average score of the teaching materials of any user i is as follows:
Figure BDA0002864911810000113
the used centrality Q _ udegree refers to the ratio of the total number U _ use of users who upload all teaching materials and are used by others to the number U of the users minus one, and the used centrality calculation formula for any user i is as follows:
Figure BDA0002864911810000114
the used class Q _ utype is a result of clustering used modes of teaching materials uploaded by all users of the teaching platform based on k-means;
the comment emotional tendency Q _ observation refers to the ratio of the sum of forward emotional comments Q _ observation _ each of all materials uploaded by the user to the total number N _ all of the materials uploaded by the user, and the average scoring calculation formula for any user i is as follows:
Figure BDA0002864911810000115
the comment centrality Q _ cdegree refers to the ratio of the total number U _ comment of users who upload all teaching materials and are commented by others to the number U of users minus one, and a comment centrality calculation formula for any user i is as follows:
Figure BDA0002864911810000116
the balance comprises an updating frequency T _ fre and fluctuation T _ vol2 characteristic variables;
the updating frequency T _ fre refers to the average times of uploading the teaching materials N _ time in each time period T by the user in the time period T, and the calculating formula of the updating frequency of the teaching materials of any user i is as follows:
Figure BDA0002864911810000121
the volatility T _ vol refers to the ratio of the teaching material N _ time uploaded by the user in each time period T in the time period T and the reference percentage B of the teaching material in the time TtThe calculation formula of the fluctuation of the teaching material of any user i is as follows:
Figure BDA0002864911810000122
in the embodiment of the invention, T is 12, and the standard percentile of the provided teaching materials in 12 months is as follows: b ═ 8%, 10%, 8%, 8%, 8%, 8%, 8%, 10%, 8%, 8%, 8% };
and S2, collecting user data from the teaching platform, performing multi-source data fusion according to the analysis method of the user data based on behaviors, contents and social dimensions to determine the characteristic variable value of the attribute of the teaching material processing capacity of each user, wherein the unitary array formed by the values of all the characteristic variables of all the attributes of the teaching material processing capacity of each user forms the teaching material processing capacity matrix of each user.
The behavior dimension analysis-based method comprises descriptive statistical analysis and K-means cluster analysis and is mainly used for calculating picture richness, audio richness, video richness, animation richness, use diversity, processing type diversity, average usage amount, maximum usage amount, self-usage total amount, student usage total amount, usage pattern, average sharing amount, average transmission amount, transmission rate, average collection amount, maximum collection amount, average download amount, maximum download amount, recognition rate, average score, used category, updating frequency and volatility characteristic variables.
The content-based dimension analysis method comprises multi-dimensional scale analysis and emotional tendency analysis, and is mainly used for calculating the diversity of themes and the characteristic variables of comment emotional tendency;
the social dimension analysis-based method comprises social network analysis and is mainly used for calculating central characteristic variables of used comments. The teaching platform comprises education application support platforms such as a regional education resource public service platform, an online teaching platform, a network research and repair platform, an online training platform and an education management platform; the embodiment of the invention adopts the Z-province education resource public service platform network learning space, and the data acquisition time is 2019, 08 months and 30 days;
the user data comprises user basic data, teaching material label data, teaching material using behavior data, teaching material grading behavior data and teaching material comment behavior data;
the user basic data comprises a user id, a user name, a user role, a user gender, a user age, a located area, a school type, a section to be taught and a subject to be taught, and can be represented by U (U _ id, U _ name, U _ type, U _ generator, U _ age, U _ area, U _ school, U _ section and U _ subject);
the value range of the user role U _ type comprises any teachers, students and others, and can be expressed as follows: u _ type ═ { u _ teacher, u _ student, u _ other };
the value range of the user gender U _ gender is {0, 1}, wherein 0 represents a female and 1 represents a male;
the value range of the school category U _ school includes a city, a county and a town, and can be represented as: u _ school ═ { u _ city, u _ town, u _ count };
the value range of the section U _ section to be taught comprises primary school, junior high school and junior high school, and can be expressed as U _ section { U _ primary, U _ junior, U _ high, 0}, wherein the section taught by the user of the student and other roles can only take the value of 0;
the value range of the taught subject U _ subject comprises Chinese, mathematics, English, physics, chemistry, biology, history, politics, geography, society, science, sports, music, art, health, legal, information technology, comprehensive practice and nothing, and can be expressed as U _ subject { U _ Chinese, U _ math, U _ English, U _ physics, U _ chemistry, U _ biology, U _ history, U _ polarity, U _ geometry, U _ society, U _ science, U _ sports, U _ labor, U _ information technology, U _ comprehensive action, 0}, wherein the user taught by the taught subject and other users can only take the value of 0;
table 1 shows a partial example of a basic data collection result of a user whose user role is an arbitrary teacher, where the total number of users whose user roles are arbitrary teachers is 10625;
TABLE 1 example of (part of) user basic data Collection for a user whose user role is an instructor
Figure BDA0002864911810000131
Figure BDA0002864911810000141
Wherein, the teacher 1 is a plum teacher, which is a 34-year-old female math teacher in primary school in a certain city of S city, Z province;
the teacher 2 is a teacher, and is a male biological teacher aged 33 in primary middle school in a certain town of S, Z province;
the teacher 10625 is a 45-year-old male scientific teacher from Zhao teacher, D city, Z province;
the teaching material basic data comprises teaching material id, teaching material name, material form, material use and processing type, and can be represented by M ═ M _ id, M _ name, M _ format, M _ use and M _ type;
the value range of the teaching material form M _ format comprises pictures, audio, video and animation, and can be expressed as follows: m _ format { m _ picture, m _ audio, m _ video, m _ animation };
the value range of the teaching material application M _ use comprises the use of pre-class pre-study, the use of in-class pre-study and the use of post-class review, and can be expressed as follows: m _ use, m _ before, m _ in, m _ after, wherein the total number of usage of the teaching material Num _ of _ use is 3;
the value range of the teaching material processing type M _ comprises conversion, beautification, selection and integration, and can be expressed as follows: m _ type ═ { m _ convert, m _ embellish, m _ excerpt, m _ integration }, wherein the total number Num _ of _ processes of teaching material processing types is 4;
table 2 is a partial example of the basic data acquisition results of teaching materials provided in the embodiment of the present invention, where the total number of the teaching materials is 95348;
TABLE 2 example of (partial) acquisition results of basic data of teaching materials
M_id M_name M_format M_use M_type
1 Small m_picture m_before m_convert
2 I m_picture m_in m_integration
... ... ... ... ...
95348 Lesson m_video m_in m_excerpt
Wherein, the teaching material with M _ id of 1 is a picture teaching material which is used for pre-lesson pre-study after conversion processing;
the teaching material with M _ id of 2 is a picture teaching material used in a classroom after integration processing;
the teaching material with the M _ id of 95348 is a video teaching material used in a classroom after beautification processing;
the teaching material label data comprises a teaching material id, a label name and a label weight, and can be represented by L (M _ id, L _ name and L _ weight);
the label weight L _ weight represents the number of times of the label, and the value range is [0, + ∞ ]; TABLE 2 teaching materials Label data acquisition results (partial) example
Table 3 is a partial example of the data collection result of the teaching material labels provided in the embodiment of the present invention, where the total number of teaching labels is 543325;
table 3 partial examples of teaching material label data collection results
M_id L_name L_weight
1 Geometry 5
2 Circular shape 2
... ... ...
95348 Mathematics, and 10
wherein, the teaching material with M _ id of 1 is marked as geometric 5 times and circular 2 times;
the teaching material with M _ id of 95348 is labeled as math 10 times;
the user uploads, browses, collects, downloads, uses, shares the procedural use behavior data such as teaching materials, and the like, and the procedural use behavior data comprises a use behavior id, a user id, a use behavior action, a teaching material, a behavior Time, and a behavior source, and can be represented by B ═ (B _ id, U, B _ action, M, Time, B _ source);
the value range of the usage behavior action B _ action includes uploading, browsing, collecting, downloading, using and sharing, and can be represented as: b _ action ═ b _ upload, b _ browse, b _ collect, b _ download, b _ use, b _ share };
the value range of the behavior source B _ source includes search, share, and others, and can be represented as: b _ source ═ { b _ searched, b _ shared, b _ other };
table 4 is a partial example of the collection results of the procedural usage behavior data, such as the user uploading, browsing, collecting, downloading, using, and sharing the teaching material, provided by the embodiment of the present invention, where the usage behavior data is 406576 bars in total;
TABLE 4 teaching materials usage behavior data acquisition results (partial) example
Figure BDA0002864911810000161
The use behavior with B _ id of 1 is that a user with U _ id of 1 browses teaching materials with M _ id of 198 at 7 points, 7 minutes and 3 seconds at 1 day of 9 months in 2018 in a searching mode;
the use behavior with B _ id of 2 is that the user with U _ id of 1 uses the teaching material with M _ id of 198 at 7 points 8 minutes and 21 seconds on 1 day of 9 months in 2018;
the use behavior of B _ id 406576 is that a user with U _ id 269 browses a teaching material with M _ id 1376 at 23 o 0 min 13 sec in 8, 30 and 2019;
the scoring behavior data of the teaching material comprises scoring behavior id, users, teaching materials, scoring score and behavior Time, and can be represented by S ═ S _ id, U, M, S _ score, and Time;
the value range of the score index is [0,5 ];
table 5 is a partial example of a result obtained by collecting scoring behavior data of a teaching material by a user according to an embodiment of the present invention, where the scoring behavior data includes 107613 bars;
TABLE 5 teaching materials Scoring behavior data acquisition results (partial) example
Figure BDA0002864911810000162
Figure BDA0002864911810000171
The scoring behavior with the S _ id of 1 is that the teaching material with the M _ id of 18 is scored as 2 by 22 minutes and 20 seconds when the user with the U _ id of 1 is 21 days in 9 months and 3 days in 2018;
the scoring behavior with the S _ id of 2 is that the teaching material with the M _ id of 1958 is scored as 5 by 7 minutes and 23 seconds when the user with the U _ id of 1 is in 2018, 9, 18 and 14 days;
the scoring behavior of 107613 for the S _ id is that the teaching material with 18723 for the M _ id is scored as 4.5 for 58 minutes and 14 seconds when 23 days 8 and 23 months in 2019 for the user with 2239 for U _ id;
the teaching material comment behavior data comprise evaluation behavior id, users, teaching materials, comment contents and behavior Time, and can be represented by C-id (C _ id, U, M, C _ comment, Time);
table 6 is a partial example of a result of collecting review behavior data of a user on a teaching material according to the embodiment of the present invention, where the total number of scoring behavior data is 252123;
TABLE 6 example of acquisition results (part of) of behavior data of review of teaching materials
C_id U M C_comment Time
1 U_id=1765 M_id=1 Is very helpful 2018-09-01 14:12:25
2 U_id=8872 M_id=1 Is not clear 2018-09-02 12:17:03
... ... ... ... ...
252123 U_id=22 M_id=91121 Support for 2019-07-31 14:38:04
The comment behavior with C _ id of 1 is that a user with U _ id of 1765 is very helpful to comment the teaching material with M _ id of 1 in 2018, 9, 1, 14, 12 minutes and 20 seconds;
the comment behavior with the C _ id of 2 is that the user with the U _ id of 8872 comments about the teaching material with the M _ id of 1 in 17 minutes and 3 seconds in 9, 2 and 12 in 2018 and not very clearly;
the comment behavior with the C _ id of 252123 is that the user with the U _ id of 22 supports the comment of the teaching material with the M _ id of 91121 in 2019, 7, 31, 14 and 38 minutes and 4 seconds;
the intermediate variables related to the values of the characteristic variables of the attributes of the teaching material processing capacity of each user comprise the number Num _ of _ topic of subjects of the teaching platform, the using mode U _ pattern of teaching resources and the used type Q _ utype of the teaching resources, and the total number N _ all of the teaching materials uploaded by the user iiAnd the number of picture teaching materials N _ pictureiAnd the number of audio teaching materials N _ audioiAnd the number of video teaching materials N _ videoiAnd the number N _ animation of animation teaching materialsiAnd the number of applications of the teaching material N _ useiAnd the number of processing types N _ Process of the teaching materialiAnd the number N _ topic of the subjects of the teaching materialiAnd the total number of the users using the teaching materials by others is U _ useiAnd the total number of the users with the commented teaching materials U _ commentiAnd the user i uploads the usage U _ each of the teaching material ni,nUsed amount of U _ teach by oneselfi,nThe usage amount of the student U _ seachi,nThe shared quantity Q _ share _ reachi,nAnd the browsed quantity Q _ difference _ reach is shared by the shared linki,nAnd the browsed amount Q _ browse _ reachi,nAnd the collected quantity Q _ collect _ eachi,nAnd the downloaded amount Q _ download _ eachi,nScore Q _ score _ eachi,nAnd forward emotion comment Q _ observation _ eachi,nAnd uploading the teaching material N _ time to the user i in each time period T in the time period Ti,t
The number Num _ of _ topic of the theme of the teaching platform is obtained by multi-dimensional scale analysis of a teaching material label network, and the value is 20 in the embodiment of the invention;
the teaching material label network is a undirected network and can be represented by Gl ═ L, El, wherein L represents all labels, and El represents the collinear relationship among the labels;
the teaching resource usage pattern U _ pattern is a K-means clustering result based on the average usage amount, the maximum usage amount, the total sub-usage amount and the total student usage amount of the user, wherein the selected K value is 4;
the used type Q _ utype of the teaching resource is the result of K-means clustering based on the average sharing amount, the average transmission amount, the average collection amount, the maximum collection amount, the average downloading amount and the maximum downloading amount of a user, wherein the selected K value is 4;
the total number N _ all of the teaching materials uploaded by the user iiThe calculation formula of (2) is as follows: n _ alli=|{B|B_action=b_upload,U=i}|;
The number N _ picture of the picture teaching materials uploaded by the user iiThe calculation formula of (2) is as follows: n _ picturei=|{B|B_action=b_upload,U=i,M_format=m_picture}|;
The number N _ audio of audio teaching materials uploaded by the user iiThe calculation formula of (2) is as follows: n _ audioi=|{B|B_action=b_upload,U=i,M_format=m_audio}|;
The number N _ video of video teaching materials uploaded by the user iiThe calculation formula of (2) is as follows: n _ videoi=|{B|B_action=b_upload,U=i,M_format=m_video}|;
The number N _ animation of animation teaching materials uploaded by the user iiThe calculation formula of (2) is as follows: n _ animationi=|{B|B_action=b_upload,U=i,M_format=m_animation}|;
The number of the applications N _ use of the user i uploading the teaching materialsiThe calculation formula of (2) is as follows: n _ usei=|{M_use|B_action=b_upload,U=i}|;
And the number N _ process of the processing types of the teaching materials uploaded by the user iiIs calculated byThe formula is as follows: n _ Processi=|{M_process|B_action=b_upload,U=i}|;
The number N _ topic of the topics of the teaching materials uploaded by the user iiIs determined according to the number of the tag topics belonging to the determined platform topics;
and the user i uploads the total number of the users using the teaching materials by others, namely U _ useiThe method is obtained through the relative income centrality of the teaching platform user using the network;
the users use a directed network, and can be represented by Gu ═ U, Eu, where U represents all users, and Eu represents that user i uses the teaching resources of user j;
the total number of users for uploading the commented teaching materials U _ comment by the user iiThe method is obtained by commenting the relative income centrality of the network on a teaching platform user;
the user comment network is a directed network and can be represented by Gc (U, Ec), wherein U represents all users, and Ec represents that user i has commented on teaching resources of user j;
the user i uploads the usage U _ each of the teaching material ni,nThe calculation formula of (2) is as follows: u _ eachi,nB _ use, M _ n, where n is B _ upload, U is i;
the user i uploads the used amount U _ teach of the teaching material ni,nThe calculation formula of (2) is as follows: u _ teachi,nB _ use, M n, U i, where n is M | B _ action B _ upload, U i;
user i upload teaching material n used by student U _ searchi,nThe calculation formula of (2) is as follows: u _ seachi,nB _ use, M n, U _ type, U _ student, n, M | B _ action B _ upload, U |;
the user i uploads the shared quantity Q _ share _ reach of the teaching material ni,nThe calculation formula of (2) is as follows: q _ share _ eachi,nB _ share, M ═ n } |, where n ═ M | B _ action ═ B _ upload, U ═ i };
the user i uploads the sharing chain of the teaching material nReceiving the browsed quantity Q _ difference _ eachi,nThe calculation formula of (2) is as follows: q _ difference _ eachi,nB _ use, B _ source B _ shared, M n, where n is B _ upload, U is i;
and the user i uploads the browsed quantity Q _ browse _ reach of the teaching material ni,nThe calculation formula of (2) is as follows: q _ brown _ reachi,nB _ browse, M ═ n } |, where n ═ M | B _ action ═ B _ upload, U ═ i };
the user i uploads the collected quantity Q _ collect _ each of the teaching material ni,nThe calculation formula of (2) is as follows: q _ collect _ eachi,nB _ collection, M ═ n } |, where n ═ M | B _ action ═ B _ upload, U ═ i };
the user i uploads the downloaded quantity Q _ download _ each of the teaching material ni,nThe calculation formula of (2) is as follows: q _ download _ eachi,nB _ download, M ═ n } |, where n ═ M | B _ action ═ B _ upload, U ═ i };
the user i uploads the score Q _ score _ each of the teaching material ni,nThe calculation formula of (2) is as follows: q _ score _ eachi,nB _ score, M n, where n is B _ action, U is i;
and the user i uploads the comment emotional tendency Q _ observation _ each of the teaching material ni,nAnalyzing and acquiring emotional tendency in natural language processing, and when the comment emotional tendency is analyzed to be positive, expressing that the comment emotional tendency belongs to positive emotion and counting as 1;
and the user i uploads a teaching material N _ time in each time period T in the time period Ti,tThe calculation formula of (2) is as follows: n _ timei,t=|{B|B_action=b_upload,B_time∈t,U=i}|;
Tables 7 and 8 are examples of values of processing and processing capability attributes and attribute characteristic variables of the user teaching materials provided by the embodiment of the present invention, where table 7 is an overall value uploaded by the user to the teaching materials, and table 8 is a specific value uploaded by the user to each teaching material;
TABLE 7 integral value (partial) example for user uploading teaching material
Figure BDA0002864911810000201
Figure BDA0002864911810000211
For user 1, the total number of uploaded teaching materials is 139, the number of picture teaching materials is 110, the number of audio teaching materials is 4, the number of video teaching materials is 16, animation teaching materials are not provided, the number of purposes of teaching materials is 2, the number of processing types of teaching materials is 2, the number of subjects of teaching materials is 1, teaching materials are used by 20 other users and are commented by 12 users, and the uploaded teaching materials in each month within 12 months are respectively 10,46,0,0,1,0,14,42,22,4,0 and 0;
table 8 example of specific values (parts) of each teaching material uploaded by user
Figure BDA0002864911810000212
Figure BDA0002864911810000221
For the user 1, the usage amount of the uploaded teaching material 1 is 3, the usage amount by the user is 3, the usage amount by students is 0, the shared amount is 3, the browsed amount is not recorded through the sharing link, the browsed amount is not recorded, the collected amount is 1, the downloaded amount is 30, the score is 4, and the number of forward emotion comments is 5; the usage amount of the uploaded teaching materials 139 is 100, the usage amount by the students is 2, the usage amount by the students is 80, the shared amount is 1, the browsed amount is not recorded through the sharing link, the browsed amount is not recorded, the collected amount is 0, the downloaded amount is 63, the score is 4, and the number of forward emotion comments is 2;
forming a teaching material processing capacity matrix X of each user iiI.e. Xi=(R_picturei,R_audioi,R_videoi,D_usei,D_processi,D_topici,U_averagei,U_maxi,U_selfi,U_studenti,U_parrteni,Q_sharei,Q_collecti,Q_mcollecti,Q_downloadi,Q_mdownloadi,Q_scorei,Q_udegreei,Q_utypei,Q_emotioni,Q_cdegreei,T_frei,T_voli};
Taking teacher 1 as an example, a teaching material processing and processing capability evaluation matrix X of teacher 1 is described1Taking the value of (A);
the picture richness value of the teacher 1 is as follows: r _ picture1=log10(N_picture1)=log10(119)=2.08;
The audio richness value of teacher 1 is: r _ audio1=log10(N_audio1)=log10(4)=0.60;
The video richness value of teacher 1 is: r _ video1=log10(N_video1)=log10(16)=1.20;
The use diversity values of teacher 1 are:
Figure BDA0002864911810000231
the processing type diversity value of the teacher 1 is:
Figure BDA0002864911810000232
the subject diversity values of teacher 1 are:
Figure BDA0002864911810000233
the average usage value of teacher 1 is:
Figure BDA0002864911810000234
the maximum usage value of teacher 1 is:
Figure BDA0002864911810000235
the self-use total value of the teacher 1 is:
Figure BDA0002864911810000236
the student total usage value of teacher 1 is:
Figure BDA0002864911810000237
the using mode of the teacher 1 belongs to the first class according to the clustering result;
the average share value of teacher 1 is:
Figure BDA0002864911810000238
the average collection value of teacher 1 is:
Figure BDA0002864911810000239
the maximum collection value of teacher 1 is:
Figure BDA00028649118100002310
the average download value of the teacher 1 is:
Figure BDA0002864911810000241
the maximum download value of the teacher 1 is:
Figure BDA0002864911810000242
the average score value of teacher 1 is:
Figure BDA0002864911810000243
the used centrality value of the teacher 1 is 0.035;
the used category of teacher 1 belongs to the third category according to the clustering result;
the values of the emotional tendency of the teacher 1 in the article theory are as follows:
Figure BDA0002864911810000244
the comment centrality value of teacher 1 is 0.003;
the update frequency of the teacher 1 takes values as follows:
Figure BDA0002864911810000245
the volatility value of teacher 1 is:
Figure BDA0002864911810000246
multidimensional input matrix X of teacher 1 in the embodiment of the invention1=(2.08,0.60,1.20,0.67,0.50,0.05,1.93,2.00,1.30,2.33,1,0.07,0.06,0.00,8.94,1.80,3.98,0.035,3,0.51,0.003,11.58,0.14);
Step S3, selecting a user set, acquiring a teaching material processing capability matrix set corresponding to the user set, and also acquiring a manually labeled capability label set of the user set.
Step S31, selecting the user set according to the area where the user is located, the school type, the section being taught and the subject dimension being taught, recording the user set as U _ teacher, recording the number of the user set as NU, and acquiring a teaching material processing capacity matrix X corresponding to each user in the user setiForming a corresponding teaching material processing capability matrix set, which is marked as X, X ═ X1,X2,...,Xi,...,XNU)TWherein X isi∈U_teacher;
The specific user set U _ teacher constructed in the embodiment of the invention is the science users of all areas, all school types and all school paragraphs in Z province, and the number NU is 5023;
in the embodiment of the invention, the teaching material processing and processing capability evaluation input data set X of the user set U _ teacheru_teacherA matrix of 5023 science users for all regions, all school types, all school segments of Z provinceXiOf a synthetic matrix, i.e.
Figure BDA0002864911810000251
Step S32, acquiring the capability label set of the artificially labeled user set U _ teacher, and recording as Yu_teacher,Yu_teacher=(Y1,Y2,...,Yi,...,YNU)TWherein Y isiCapability tag for each user, Yi∈U_teacher;
Capability label YiIs determined according to the user self-labeling data and the expert labeling data, firstly, the user self-labeling data St is calculatediAnd first expert annotation data SeiError value e ofi=|Sti-SeiIf eiLess than a predetermined threshold E, a capability label YiDetermined by averaging the two, if eiIf the value is larger than the set critical value E, the second expert marking data Sa is obtainediSeparately calculate SaiTo Sti、SeiDistance, capability label YiFrom SaiAnd the average value of the scores with smaller distance to the average value is determined, and the calculation formula is as follows:
Figure BDA0002864911810000252
table 9 is an example of a user material processing and handling capability value label (section) in an example of the present invention. Wherein the critical value E is 20.
TABLE 9 user materials processing and handling capability value tag (partial) examples
Figure BDA0002864911810000253
Figure BDA0002864911810000261
Self-evaluation score St of teacher 11100, expertEvaluation score Se1Teacher 1 material processing and processing final score Y of 1001100 ═ 100+100)/2 ═ 100; self-evaluation score St of teacher 2275, expert evaluation score Se2=100,e2=|St2-Se2|=|75-100|=25>E20, the evaluation score St of expert 2280 while |80-100>80-75, so that the teacher 2 processes and processes the material to obtain a final score Y2(80+ 75)/2: 77.5; self-evaluation score St of teacher 33Expert evaluation score Se of 1003Teacher 3 material processing and processing final score Y of 903(100+90)/2 ═ 95; end user material processing and handling capability score matrix Y ═ 100, 77.5., 95)T;;
And step S4, constructing a regression model based on various machine learning methods, wherein the regression model is used for outputting recognized capability labels according to an input teaching material processing capability matrix, and training the regression model by using the teaching material processing capability matrix set and the capability label set to determine an optimal regression model.
The multiple regression models comprise a multiple linear regression model, a random forest regression model, a support vector machine regression model and a BP neural network regression model;
the multiple linear regression model is a linear regression model fitted by minimizing the sum of the squares of the residuals between the value labels of the sample users and the predicted values of the linear model, and the calculation formula of the value labels is as follows:
Figure BDA0002864911810000262
wherein Y is a value tag, C is a constant, R _ picture is a picture richness variable, R _ audio is an audio richness characteristic variable, R _ video is a video richness characteristic variable, R _ animation is an animation richness characteristic variable, D _ use is a usage diversity characteristic variable, D _ process is a processing type diversity characteristic variable, D _ topic is a theme diversity characteristic variable, U _ average is an average usage characteristic variable, and U _ max is a maximum usage characteristic variableU _ self is an independent use total amount characteristic variable, U _ student is a student use total amount characteristic variable, U _ Pattern is a use mode characteristic variable, Q _ share is an average share characteristic variable, Q _ dispersion is an average spread characteristic variable, Q _ dispersion _ rate is a spread characteristic variable, Q _ collect is an average collect characteristic variable, Q _ mcollect is a maximum collect characteristic variable, Q _ download is an average download characteristic variable, Q _ download is a maximum download characteristic variable, Q _ recognition is an acceptance characteristic variable, Q _ score is an average score characteristic variable, Q _ udegregore is a used centrality characteristic variable, Q _ ute is a used category characteristic variable, Q _ emolumpy is a comment sentiment characteristic variable, Q _ cdgrere is a comment centrality characteristic variable, T _ fre is an update frequency characteristic variable, and T _ percent is a volatility characteristic variable,
Figure BDA0002864911810000271
and ω1~ω26Epsilon is the error for the weight coefficient obtained by training;
the random forest regression model is an algorithm model using a CART decision tree as a weak learner and randomly selecting features, T weak learners are independently trained through T-time acquisition, and the final result is obtained by calculating the regression results of the T weak learners by adopting a weighted average method;
the support vector machine regression model is used for mapping an input teaching material processing capacity matrix into a high-dimensional feature space through a kernel function to realize regression calculation of a value label, and the calculation formula of the value label is as follows:
Figure BDA0002864911810000272
wherein Y is a value tag, wherein,
Figure BDA0002864911810000273
and alphaiIs Lagrange coefficient, x is the characteristic variable of the processing attribute of the input user teaching material,
Figure BDA0002864911810000274
is a characteristic variable xiIn the transposed form of (a) to (b),
Figure BDA0002864911810000275
is a kernel function, satisfies
Figure BDA0002864911810000276
b is a constant;
the BP neural network regression model is a three-layer neural network with an input layer, a hidden layer and an output layer, each layer is composed of a plurality of neurons, the input layer is 27 feature variables of processing attributes of user teaching materials, the number of the hidden layers is 9, the number of the output layers is 1 value label, and regression of the value labels is realized through full connection of the neurons;
dividing sample data formed by the teaching material processing capacity matrix set and the capacity label set into k groups, extracting 1 group of teachers from the k groups of sample data as a test set each time, taking the remaining k-1 groups of teachers as a training set, and gradually training the regression model for k times, wherein k is 10 in the embodiment of the invention;
the trained evaluation effect value is the mean absolute percentage error value of the regression model, and is marked as MAPE, and the calculation mode is as follows:
Figure BDA0002864911810000281
wherein M is the number of users, y ', corresponding to the test set sample'jIs a predicted value of the teacher's j ability label, yjThe actual value of the teacher j capability label;
and S5, comparing the evaluation effects of the different regression models, determining the regression model with the minimum MAPE value as the optimal regression model, and dynamically identifying the processing capacity of the user teaching material.
The average MAPE value of the four regression models in the embodiment of the invention is 10.76%, and the validity of the automatic identification regression model based on the processing and processing capability of the user teaching material of multi-source data fusion is determined on the whole, wherein the model is based on a multiple linear regression model L1The loss function MAPE value of (1) is only 5.29%, the final selection baseIn a multiple linear regression model L1The regression model is an optimal regression model, namely the final characteristic automatic identification regression model for the processing and processing capacity of the user teaching materials;
optimal regression model L1Wherein picture richness R _ picture, audio richness R _ audio, video richness R _ video, usage diversity D _ use, theme diversity D _ topic, processing type diversity D _ process, average usage U _ average, maximum usage U _ max, self usage U _ self, student usage U _ student, usage pattern U _ party, average share Q _ share, average collection Q _ collect, maximum collection Q _ mcollect, average download Q _ download, maximum download Q _ mdolload, average score Q _ score, central Q _ udegrede used, category Q _ issue used, emotion comment Q _ emotion, central Q _ cdree comment, update frequency T _ fre, volatility T _ vol used as an independent variable, user teaching material and processing ability are used as a gradual model for automatic identification of picture richness, and the residual value is analyzed by an automatic regression model, the use diversity, the average use amount, the average share amount, the average collection amount, the maximum download amount and the volatility are 7 items in total, the R-square value is 0.716, the picture richness, the use diversity, the average use amount, the average share amount, the average collection amount, the maximum download amount and the volatility can explain the 71.6% change reason of the final score. And the model passed the F test (F: 34.208, p: 0.000)<0.05), the model is valid. In addition, multiple collinearity of the model is checked, and the VIF values in the model are all smaller than 5, which means that the collinearity problem does not exist; and the value of D-W (D-W ═ 2.016) is around number 2, thus it is demonstrated that there is no autocorrelation in the model, there is no correlation between sample data, and the model is good. Table 10 shows a stepwise regression model L according to an embodiment of the present invention1The concrete result of (1).
TABLE 10 stepwise regression model L of the examples of the present invention1The concrete result of (1).
Figure BDA0002864911810000291
The final regression equation is: final score YU_teacher41.680+6.685 picture richness R _ picture +26.389 usage diversity D _ use +16.463 average usage U _ average-1.153 average share Q _ share +19.064 average collection Q _ gather +4.927 maximum download Q _ mdownload-35.233 volatility T _ vol;
taking a mathematics male teacher 10626 of a certain city and primary school in Z province, which is not a sample, as an example, the automatic test result of the processing and processing capability evaluation result of the teaching materials is explained;
picture richness R _ picture of teacher 10626106261.08, application diversity D _ use106261.00, average usage amount U _ average106260.54, average share Q _ share106263.93, average Collection Q _ Collection106260.00, maximum download Q _ mdownload10626Is 0.49, volatility T _ vol10626Is 0.31;
according to the capability evaluation model Lu_teacherThe teaching material processing and processing ability score of the user 10626 is automatically calculated to be 71.14.
And step S6, dynamically identifying the processing capacity of the user teaching materials by using the trained optimal regression model.
Collecting user update data at time t;
dynamically updating the capability label of the user based on the user updating data and the trained optimal regression model;
and (3) updating multi-source process data of the teaching material processing and processing capacity of the teacher 10626 by taking a half year as an updating period C, wherein the picture richness R _ picture of the teacher 10626 is obtained in 29 days 2 and 29 months 2020106261.28, diversity of uses D _ use106261.00, average usage amount U _ average106260.64, average share Q _ share106263.93, average Collection Q _ Collection106260.00, maximum download Q _ mdownload10626Is 0.49, volatility T _ vol106260.37, the final score is the picture richness R _ picture of teacher 10626 at 30 days 8/202010626Is 2.68, application diversity D _ use10626Is 1.00, average usage amount U _ average106260.81, average share Q _ share106263.23, average Collection Q _ Collection106260.01, maximum download Q _ mdowload10626Is 0.49, volatility T _ vol106260.38 with a final score of 84.81;
the embodiment of the invention provides a multi-source data fusion-based automatic identification system for processing and processing capacity of user teaching materials, which comprises the following steps:
the pre-defining module is used for pre-defining attributes of processing capacity of the user teaching materials and characteristic variables contained in each attribute;
the data acquisition module is used for acquiring user data from the teaching platform, performing multi-source data fusion from behaviors, contents and social dimensions according to the user data to determine the value of the variable of the attribute of the teaching material processing capacity of each user, and forming a unitary array consisting of the values of all characteristic variables of all attributes of the teaching material processing capacity of each user into a teaching material processing capacity matrix of each user;
the sample acquisition module is used for setting a screening condition to select a user set, acquiring a teaching material processing capacity matrix set corresponding to the user set and also acquiring a manually marked capacity label set of the user set;
the training module is used for constructing a regression model based on multiple types, the regression model is used for outputting recognized capability labels according to an input teaching material processing capability matrix, the regression model is trained by utilizing the teaching material processing capability matrix set and the capability label set, and an optimal regression model is determined;
and the recognition module is used for dynamically recognizing the processing capacity of the user teaching materials by utilizing the trained optimal regression model.
The implementation principle and technical effect of the system are similar to those of the method, and are not described herein again.
It must be noted that in any of the above embodiments, the methods are not necessarily executed in order of sequence number, and as long as it cannot be assumed from the execution logic that they are necessarily executed in a certain order, it means that they can be executed in any other possible order.
It will be understood by those skilled in the art that the foregoing is only a preferred embodiment of the present invention, and is not intended to limit the invention, and that any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims (10)

1. A method for automatically identifying processing capacity of a user teaching material based on multi-source data fusion is applied to a teaching platform supporting processing or management of the teaching material, and is characterized by comprising the following steps:
s1, predefining attributes of the processing capacity of the user teaching materials and characteristic variables contained in each attribute;
s2, collecting user data from the teaching platform, performing multi-source data fusion according to the user data by an analysis method based on behaviors, contents and social dimensions, and determining the characteristic variable values of the attributes of the teaching material processing capacity of each user, wherein the unitary array formed by the values of all the characteristic variables of all the attributes of the teaching material processing capacity of each user forms a teaching material processing capacity matrix of each user;
s3, selecting a user set, acquiring a teaching material processing capacity matrix set corresponding to the user set, and acquiring a manually labeled capacity label set of the user set;
s4, constructing multiple regression models based on multiple machine learning methods, wherein the regression models are used for outputting recognized capability labels according to input teaching material processing capability matrixes, training the regression models by using the teaching material processing capability matrix set and the capability label set, and determining an optimal regression model;
and S5, dynamically identifying the processing capacity of the user teaching materials by using the trained optimal regression model.
2. The method for automatically identifying the processing capability of the user teaching materials based on the multi-source data fusion as claimed in claim 1, wherein the attributes of the processing capability of the teaching materials include richness, diversity, usability, usefulness and timeliness;
the richness is used for expressing quantity distribution characteristics of teaching materials in different file formats;
the diversity is used for representing the distribution characteristics of the purpose and the processing type of the teaching materials;
the usability is used for representing the use characteristics of the uploader of the teaching materials on the teaching materials;
the usefulness is used for representing the recognition characteristics of the teaching materials by other people except the uploader of the teaching materials;
the timeliness is used for representing the fluctuation characteristics of the updating frequency of the teaching materials.
3. The method of claim 2, wherein the richness comprises 4 characteristic variables of picture richness, audio richness, video richness and animation richness;
the diversity comprises 3 characteristic variables of use diversity, processing type diversity and theme diversity;
the availability comprises 5 characteristic variables of average usage, maximum usage, total self-usage, total student usage and usage pattern;
the usefulness comprises 13 characteristic variables of average share quantity, average spread quantity, spread rate, average collection quantity, maximum collection quantity, average download quantity, maximum download quantity, recognition rate, average score, used centrality, used category, comment emotional tendency and comment centrality;
the timeliness comprises 2 characteristic variables of updating frequency and volatility.
4. The method for automatically identifying processing capability of user teaching materials based on multi-source data fusion as claimed in claim 1, wherein the user data comprises user basic data, teaching material label data, teaching material use behavior data, teaching material scoring behavior data and teaching material review behavior data;
the user basic data comprises a user id, a user name, a user role, a user gender, a user age, a located area, a school type, a section to be taught and a subject to be taught;
the teaching material basic data comprises a teaching material id, a teaching material name, a material form, a material purpose and a processing type;
the teaching material label data comprises a teaching material id, a label name and a label weight;
the teaching material use behavior data comprises use behavior id, users, use behavior actions, teaching materials, behavior time and behavior sources;
the teaching material grading behavior data comprise grading behavior id, users, teaching materials, grading score and behavior time;
the teaching material comment behavior data comprise evaluation behavior id, users, teaching materials, comment contents and behavior time.
5. The automatic identification method for processing and processing capacity of user teaching materials based on multi-source data fusion as claimed in claim 3, wherein the analysis method based on behavior dimension includes descriptive statistical analysis and K-means cluster analysis, and is mainly used for calculating picture richness, audio richness, video richness, animation richness, use diversity, processing type diversity, average usage amount, maximum usage amount, self-usage total amount, student usage total amount, usage pattern, average sharing amount, average transmission amount, transmission rate, average collection amount, maximum collection amount, average download amount, maximum download amount, acceptance rate, average score, used category, update frequency and volatility characteristic variables;
the content-based dimension analysis method comprises multi-dimensional scale analysis and emotional tendency analysis, and is mainly used for calculating the diversity of themes and the characteristic variables of comment emotional tendency;
the social dimension analysis-based method comprises social network analysis and is mainly used for calculating central characteristic variables of used comments.
6. The method for automatically identifying processing capability of user teaching materials based on multi-source data fusion as claimed in claim 1, wherein the step S3 includes the steps of:
s31, selecting the user set according to the area where the user is located, the school type, the section being taught and the subject dimension being taught, recording the user set as U _ teacher, recording the number of the user set as NU, and acquiring a teaching material processing capacity matrix X corresponding to each user in the user setiForming a corresponding teaching material processing capability matrix set, which is marked as X, X ═ X1,X2,...,Xi,...,XNU)TWherein X isi∈U_teacher;
S32, acquiring a manually labeled user set U _ teacher capability label set, and recording as Yu_teacher,Yu_teacher=(Y1,Y2,...,Yi,...,YNU)TWherein Y isiCapability tag for each user, Yi∈U_teacher;
Capability label YiIs determined according to the user self-labeling data and the expert labeling data, firstly, the user self-labeling data St is calculatediAnd first expert annotation data SeiError value e ofi=|Sti-SeiIf eiLess than a predetermined threshold E, a capability label YiDetermined by averaging the two, if eiIf the value is larger than the set critical value E, the second expert marking data Sa is obtainediSeparately calculate SaiTo Sti、SeiDistance, capability label YiFrom SaiAnd the average value of the scores with smaller distance to the average value is determined, and the calculation formula is as follows:
Figure FDA0002864911800000041
7. the method for automatically identifying processing capacity of user teaching materials based on multi-source data fusion as claimed in claim 1, wherein the multiple regression models comprise a multiple linear regression model, a random forest regression model, a support vector machine regression model and a BP neural network regression model;
the multiple linear regression model is a linear regression model fitted by minimizing the sum of squared residuals between the value labels of the sample users and the predicted values of the linear model, and the calculation formula of the value labels is as follows:
Figure FDA0002864911800000042
wherein Y is a value tag, C is a constant, R _ picture is a picture richness variable, R _ audio is an audio richness characteristic variable, R _ video is a video richness characteristic variable, R _ animation is an animation richness characteristic variable, D _ use is a use diversity characteristic variable, D _ process is a processing type diversity characteristic variable, D _ topic is a subject diversity characteristic variable, U _ average is an average use characteristic variable, U _ max is a maximum use characteristic variable, U _ self is an independent use total amount characteristic variable, U _ student use is a student use total amount characteristic variable, U _ Pattern is a use mode characteristic variable, Q _ share is an average share characteristic variable, Q _ difference is an average propagate characteristic variable, Q _ difference _ rate is a propagate characteristic variable, Q _ count is an average collection amount characteristic variable, Q _ mcort is a maximum collection amount characteristic variable, and Q _ download is an average download amount characteristic variable, q _ mdownload is a maximum download quantity characteristic variable, Q _ recognition is an acceptance rate characteristic variable, Q _ score is an average scoring characteristic variable, Q _ udegree is a used central characteristic variable, Q _ utype is a used category characteristic variable, Q _ emotion is a comment emotional tendency characteristic variable, Q _ cdegree is a comment central characteristic variable, T _ fre is an update frequency characteristic variable, T _ vol is a volatility characteristic variable,
Figure FDA0002864911800000051
and ω1~ω26Epsilon is the error for the weight coefficient obtained by training;
the random forest regression model is an algorithm model using a CART decision tree as a weak learner and randomly selecting features, T weak learners are independently trained through T-time acquisition, and the final result is obtained by calculating the regression results of the T weak learners by adopting a weighted average method;
the support vector machine regression model is used for mapping an input teaching material processing capacity matrix into a high-dimensional feature space through a kernel function to realize regression calculation of a value label, and the calculation formula of the value label is as follows:
Figure FDA0002864911800000052
wherein Y is a value tag, wherein,
Figure FDA0002864911800000053
and alphaiIs Lagrange coefficient, x is the characteristic variable of the processing attribute of the input user teaching material,
Figure FDA0002864911800000054
is a characteristic variable xiIn the transposed form of (a) to (b),
Figure FDA0002864911800000055
is a kernel function, satisfies
Figure FDA0002864911800000056
b is a constant;
the BP neural network regression model is a three-layer neural network with an input layer, a hidden layer and an output layer, each layer is composed of a plurality of neurons, the input layer is 27 feature variables of processing attributes of user teaching materials, the number of the hidden layers is 9, the output layer is 1 value tag, and regression of the value tags is achieved through full connection of the neurons.
8. The method for automatically identifying processing capability of user teaching materials based on multi-source data fusion as claimed in claim 1, wherein the step S4 includes the steps of:
dividing sample data formed by the teaching material processing capacity matrix set and the capacity label set into k groups, extracting 1 group of teachers from the k groups of sample data as a test set each time, taking the remaining k-1 groups of teachers as a training set, and gradually training the regression model for k times;
the trained evaluation effect value is the mean absolute percentage error value of the regression model, and is marked as MAPE, and the calculation mode is as follows:
Figure FDA0002864911800000061
wherein M is the number of users, y ', corresponding to the test set sample'jIs a predicted value of the teacher's j ability label, yjThe actual value of the teacher j capability label;
and comparing the evaluation effects of different regression models, and determining the regression model with the minimum MAPE value as the optimal regression model.
9. The method for automatically identifying processing capability of user teaching materials based on multi-source data fusion as claimed in claim 1, further comprising step S6:
collecting user update data at time t;
and dynamically updating the capability labels of the users based on the user updating data and the trained optimal regression model.
10. The utility model provides a user teaching material processing ability automatic identification system based on multisource data fusion, is applied to the teaching platform who supports processing or management to teaching material, its characterized in that includes:
the pre-defining module is used for pre-defining attributes of processing capacity of the user teaching materials and characteristic variables contained in each attribute;
the data acquisition module is used for acquiring user data from the teaching platform, performing multi-source data fusion from behaviors, contents and social dimensions according to the user data to determine the value of the variable of the attribute of the teaching material processing capacity of each user, and forming a unitary array consisting of the values of all characteristic variables of all attributes of the teaching material processing capacity of each user into a teaching material processing capacity matrix of each user;
the sample acquisition module is used for setting a screening condition to select a user set, acquiring a teaching material processing capacity matrix set corresponding to the user set and also acquiring a manually marked capacity label set of the user set;
the training module is used for constructing a regression model based on multiple types, the regression model is used for outputting recognized capability labels according to an input teaching material processing capability matrix, the regression model is trained by utilizing the teaching material processing capability matrix set and the capability label set, and an optimal regression model is determined;
and the recognition module is used for dynamically recognizing the processing capacity of the user teaching materials by utilizing the trained optimal regression model.
CN202011583583.7A 2020-12-28 2020-12-28 Automatic identification method and system for processing capability of user teaching materials Active CN112699933B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011583583.7A CN112699933B (en) 2020-12-28 2020-12-28 Automatic identification method and system for processing capability of user teaching materials

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011583583.7A CN112699933B (en) 2020-12-28 2020-12-28 Automatic identification method and system for processing capability of user teaching materials

Publications (2)

Publication Number Publication Date
CN112699933A true CN112699933A (en) 2021-04-23
CN112699933B CN112699933B (en) 2023-07-07

Family

ID=75513027

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011583583.7A Active CN112699933B (en) 2020-12-28 2020-12-28 Automatic identification method and system for processing capability of user teaching materials

Country Status (1)

Country Link
CN (1) CN112699933B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116167669A (en) * 2023-04-26 2023-05-26 国网浙江省电力有限公司金华供电公司 Carbon emission assessment method based on power consumption regression

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140272914A1 (en) * 2013-03-15 2014-09-18 William Marsh Rice University Sparse Factor Analysis for Learning Analytics and Content Analytics
CN108846530A (en) * 2018-09-28 2018-11-20 国网上海市电力公司 One kind being based on the short-term load forecasting method of " cluster-recurrence " model
CN109191953A (en) * 2018-11-12 2019-01-11 重庆靶向科技发展有限公司 A kind of intelligentized system of teaching and learning and method
US20190318407A1 (en) * 2015-07-17 2019-10-17 Devanathan GIRIDHARI Method for product search using the user-weighted, attribute-based, sort-ordering and system thereof
CN111275239A (en) * 2019-12-20 2020-06-12 西安电子科技大学 Multi-mode-based networked teaching data analysis method and system

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140272914A1 (en) * 2013-03-15 2014-09-18 William Marsh Rice University Sparse Factor Analysis for Learning Analytics and Content Analytics
US20190318407A1 (en) * 2015-07-17 2019-10-17 Devanathan GIRIDHARI Method for product search using the user-weighted, attribute-based, sort-ordering and system thereof
CN108846530A (en) * 2018-09-28 2018-11-20 国网上海市电力公司 One kind being based on the short-term load forecasting method of " cluster-recurrence " model
CN109191953A (en) * 2018-11-12 2019-01-11 重庆靶向科技发展有限公司 A kind of intelligentized system of teaching and learning and method
CN111275239A (en) * 2019-12-20 2020-06-12 西安电子科技大学 Multi-mode-based networked teaching data analysis method and system

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
刘金晶;王丽英;: "在线学习社区发帖质量评价的回归模型研究", 南京师范大学学报(工程技术版), no. 01 *
李亚婷;陈敏;王欢;周驰;王会军;: "融合网络学习空间过程性数据的中小学教师信息素养评估研究", 中国电化教育, no. 09 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116167669A (en) * 2023-04-26 2023-05-26 国网浙江省电力有限公司金华供电公司 Carbon emission assessment method based on power consumption regression
CN116167669B (en) * 2023-04-26 2023-07-21 国网浙江省电力有限公司金华供电公司 Carbon emission assessment method based on power consumption regression

Also Published As

Publication number Publication date
CN112699933B (en) 2023-07-07

Similar Documents

Publication Publication Date Title
CN107230174B (en) Online interactive learning system and method based on network
Matzavela et al. Decision tree learning through a predictive model for student academic performance in intelligent m-learning environments
Bonsaksen Predictors of academic performance and education programme satisfaction in occupational therapy students
CN113656687B (en) Teacher portrait construction method based on teaching and research data
Zhong et al. Design of a personalized recommendation system for learning resources based on collaborative filtering
Hernández Torrano et al. A bibliometric analysis of publications in the web of science category of educational psychology in the last two decades
Gil-Izquierdo et al. Guidelines for data fusion with international large scale assessments: Insights from the TALIS-PISA link database
José-García et al. C3-IoC: A career guidance system for assessing student skills using machine learning and network visualisation
Li A study on the influence of non-intelligence factors on college students’ English learning achievement based on C4. 5 algorithm of decision tree
CN117033603A (en) Construction method, device, equipment and storage medium of large model in vertical field
Frans et al. Early identification of children at risk for academic difficulties using standardized assessment: stability and predictive validity of preschool math and language scores
Oreski et al. CRISP-DM process model in educational setting
CN108763459B (en) Professional tendency analysis method and system based on psychological test and DNN algorithm
CN112699933B (en) Automatic identification method and system for processing capability of user teaching materials
Kim et al. Summarizing Students’ Free Responses for an Introductory Algebra-Based Physics Course Survey Using Cluster and Sentiment Analysis
Heys Machine learning as a tool to identify critical assignments
CN117271776A (en) Intelligent multi-label labeling method and system for difficulty, knowledge point and solution idea
CN115600834A (en) Middle and primary school teacher digital literacy evaluation method and system based on procedural data
CN115796692A (en) Method and system for constructing digital literacy portrait of primary and middle school teacher integrating dynamic and static data
Balabied et al. Utilizing random forest algorithm for early detection of academic underperformance in open learning environments
Luo et al. A Personalized MOOC Learning Group and Course Recommendation Method Based on Graph Neural Network and Social Network Analysis
Su [Retracted] Smart Teaching Design Mode based on Machine Learning and its Effect Evaluation
Rahayuningsih Polimarin Alumni's English Proficiency in Maritime Industry Competitiveness in Indonesia
Ngo et al. Exploration and integration of job portals in Vietnam
Van der Merwe et al. Mapping the field of statistics education research in search of scholarship

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant