CN112699933A - Automatic identification method and system for processing capacity of user teaching material - Google Patents
Automatic identification method and system for processing capacity of user teaching material Download PDFInfo
- Publication number
- CN112699933A CN112699933A CN202011583583.7A CN202011583583A CN112699933A CN 112699933 A CN112699933 A CN 112699933A CN 202011583583 A CN202011583583 A CN 202011583583A CN 112699933 A CN112699933 A CN 112699933A
- Authority
- CN
- China
- Prior art keywords
- user
- teaching
- teaching material
- average
- characteristic variable
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02P—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
- Y02P90/00—Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
- Y02P90/30—Computing systems specially adapted for manufacturing
Landscapes
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
- Electrically Operated Instructional Devices (AREA)
Abstract
The invention discloses a method and a system for automatically identifying the processing capacity of a user teaching material based on multi-source data fusion. The method comprises the following steps: s1, predefining attributes of the processing capacity of the user teaching materials and characteristic variables contained in each attribute, wherein the attributes comprise richness, diversity, usability, usefulness and timeliness; s2, collecting user data from the teaching platform, and calculating a teaching material processing capacity matrix of each user according to the user data; s3, acquiring sample data of the user set; s4, constructing a regression model based on various machine learning methods, training the regression model by using sample data, and determining an optimal regression model; and S5, dynamically identifying the processing capacity of the user teaching materials by using the trained optimal regression model. The invention can realize the intelligent automatic identification of the processing and handling capacity of the user teaching materials.
Description
Technical Field
The invention belongs to the field of education informatization, and particularly relates to a method and a system for automatically identifying processing capacity of a user teaching material based on multi-source data fusion.
Background
With the development of computer technology, teaching platforms for various types of auxiliary teaching become important information carriers in teaching, and the teaching platforms include, but are not limited to, a regional education resource public service platform, an online teaching platform, a network research and repair platform, an online training platform, an education management platform and the like. In teaching based on a teaching platform, processing of teaching materials and recognition of processing capabilities of users such as users are very important contents.
At present, the processing and processing capacity of the user teaching materials is still identified in the form of questionnaires, for example, the user carries out self-evaluation through scales or test questions, only the current state of the user is concerned, the investigation process has certain subjectivity and needs high cooperation, meanwhile, the consideration of the processing and processing process data of the objective teaching materials of the user is ignored, and the problems of inaccurate identification, low identification efficiency and low data utilization rate exist. How to utilize the computer technology and realize more objective, more accurate and more continuous intelligent automatic identification based on the user data of the user on the teaching platform is a very important problem. There is no mature computer-based automatic identification technology in the prior art.
Disclosure of Invention
Aiming at least one defect or improvement requirement in the prior art, the invention provides a method and a system for automatically identifying the processing capacity of a user teaching material based on multi-source data fusion, which can realize the intelligent automatic identification of the processing capacity and the processing capacity of the user teaching material.
In order to achieve the above object, according to a first aspect of the present invention, there is provided a method for automatically identifying processing capability of a user teaching material based on multi-source data fusion, which is applied to a teaching platform supporting processing or management of the teaching material, and includes the steps of:
s1, predefining attributes of the processing capacity of the user teaching materials and characteristic variables contained in each attribute;
s2, collecting user data from the teaching platform, performing multi-source data fusion according to the user data by an analysis method based on behaviors, contents and social dimensions, and determining the characteristic variable values of the attributes of the teaching material processing capacity of each user, wherein the unitary array formed by the values of all the characteristic variables of all the attributes of the teaching material processing capacity of each user forms a teaching material processing capacity matrix of each user;
s3, selecting a user set, acquiring a teaching material processing capacity matrix set corresponding to the user set, and acquiring a manually labeled capacity label set of the user set;
s4, constructing multiple regression models based on multiple machine learning methods, wherein the regression models are used for outputting recognized capability labels according to input teaching material processing capability matrixes, training the regression models by using the teaching material processing capability matrix set and the capability label set, and determining an optimal regression model;
and S5, dynamically identifying the processing capacity of the user teaching materials by using the trained optimal regression model.
Preferably, the attributes of the processing capability of the teaching materials comprise richness, diversity, usability, usefulness and timeliness;
the richness is used for expressing quantity distribution characteristics of teaching materials in different file formats;
the diversity is used for representing the distribution characteristics of the purpose and the processing type of the teaching materials;
the usability is used for representing the use characteristics of the uploader of the teaching materials on the teaching materials;
the usefulness is used for representing the recognition characteristics of the teaching materials by other people except the uploader of the teaching materials;
the timeliness is used for representing the fluctuation characteristics of the updating frequency of the teaching materials.
Preferably, the richness comprises 4 characteristic variables of picture richness, audio richness, video richness and animation richness;
the diversity comprises 3 characteristic variables of use diversity, processing type diversity and theme diversity;
the availability comprises 5 characteristic variables of average usage, maximum usage, total self-usage, total student usage and usage pattern;
the usefulness comprises 13 characteristic variables of average share quantity, average spread quantity, spread rate, average collection quantity, maximum collection quantity, average download quantity, maximum download quantity, recognition rate, average score, used centrality, used category, comment emotional tendency and comment centrality;
the timeliness comprises 2 characteristic variables of updating frequency and volatility.
Preferably, the user data comprises user basic data, teaching material label data, teaching material use behavior data, teaching material grading behavior data and teaching material comment behavior data;
the user basic data comprises a user id, a user name, a user role, a user gender, a user age, a located area, a school type, a section to be taught and a subject to be taught;
the teaching material basic data comprises a teaching material id, a teaching material name, a material form, a material purpose and a processing type;
the teaching material label data comprises a teaching material id, a label name and a label weight;
the teaching material use behavior data comprises use behavior id, users, use behavior actions, teaching materials, behavior time and behavior sources;
the teaching material grading behavior data comprise grading behavior id, users, teaching materials, grading score and behavior time;
the teaching material comment behavior data comprise evaluation behavior id, users, teaching materials, comment contents and behavior time.
Preferably, the behavior-based dimension analysis method comprises descriptive statistical analysis and K-means cluster analysis, and is mainly used for calculating picture richness, audio richness, video richness, animation richness, use diversity, processing type diversity, average usage, maximum usage, self-usage total, student usage total, usage pattern, average sharing amount, average transmission amount, transmission rate, average collection amount, maximum collection amount, average downloading amount, maximum downloading amount, approval rate, average score, used category, update frequency and volatility characteristic variables;
the content-based dimension analysis method comprises multi-dimensional scale analysis and emotional tendency analysis, and is mainly used for calculating the diversity of themes and the characteristic variables of comment emotional tendency;
the social dimension analysis-based method comprises social network analysis and is mainly used for calculating central characteristic variables of used comments.
Preferably, the S3 includes the steps of:
s31, selecting the user set according to the area where the user is located, the school type, the section being taught and the subject dimension being taught, recording the user set as U _ teacher, recording the number of the user set as NU, and acquiring a teaching material processing capacity matrix X corresponding to each user in the user setiForming a corresponding teaching material processing capability matrix set, which is marked as X, X ═ X1,X2,...,Xi,...,XNU)TWherein X isi∈U_teacher;
S32, acquiring a manually labeled user set U _ teacher capability label set, and recording as Yu_teacher,Yu_teacher=(Y1,Y2,...,Yi,...,YNU)TWherein Y isiCapability tag for each user, Yi∈U_teacher;
Capability label YiIs determined according to the user self-labeling data and the expert labeling data, firstly, the user self-labeling data St is calculatediAnd first expert annotation data SeiError value e ofi=|Sti-SeiIf eiLess than a predetermined threshold E, a capability label YiDetermined by averaging the two, if eiIf the value is larger than the set critical value E, the second expert marking data Sa is obtainediRespectively countCalculating SaiTo Sti、SeiDistance, capability label YiFrom SaiAnd the average value of the scores with smaller distance to the average value is determined, and the calculation formula is as follows:
preferably, the multiple regression models comprise a multiple linear regression model, a random forest regression model, a support vector machine regression model and a BP neural network regression model;
the multiple linear regression model is a linear regression model fitted by minimizing the sum of squared residuals between the value labels of the sample users and the predicted values of the linear model, and the calculation formula of the value labels is as follows:
wherein Y is a value tag, C is a constant, R _ picture is a picture richness variable, R _ audio is an audio richness characteristic variable, R _ video is a video richness characteristic variable, R _ animation is an animation richness characteristic variable, D _ use is a use diversity characteristic variable, D _ process is a processing type diversity characteristic variable, D _ topic is a subject diversity characteristic variable, U _ average is an average use characteristic variable, U _ max is a maximum use characteristic variable, U _ self is an independent use total amount characteristic variable, U _ student use is a student use total amount characteristic variable, U _ Pattern is a use mode characteristic variable, Q _ share is an average share characteristic variable, Q _ difference is an average propagate characteristic variable, Q _ difference _ rate is a propagate characteristic variable, Q _ count is an average collection amount characteristic variable, Q _ mcort is a maximum collection amount characteristic variable, and Q _ download is an average download amount characteristic variable, q _ mdownload is a maximum download quantity characteristic variable, Q _ recognition is an acceptance rate characteristic variable, Q _ score is an average grading characteristic variable, Q _ udegree is a used central characteristic variable, Q _ utype is a used category characteristic variable, Q _ emotion is a comment emotional tendency characteristic variable, Q _ cdegree is a comment central characteristic variable, and T _ fre is update frequency characteristic variableA characteristic variable, T _ vol is a volatility characteristic variable,and ω1~ω26Epsilon is the error for the weight coefficient obtained by training;
the random forest regression model is an algorithm model using a CART decision tree as a weak learner and randomly selecting features, T weak learners are independently trained through T-time acquisition, and the final result is obtained by calculating the regression results of the T weak learners by adopting a weighted average method;
the support vector machine regression model is used for mapping an input teaching material processing capacity matrix into a high-dimensional feature space through a kernel function to realize regression calculation of a value label, and the calculation formula of the value label is as follows:
wherein Y is a value tag, wherein,and alphaiIs Lagrange coefficient, x is the characteristic variable of the processing attribute of the input user teaching material,is a characteristic variable xiIn the transposed form of (a) to (b),is a kernel function, satisfiesb is a constant;
the BP neural network regression model is a three-layer neural network with an input layer, a hidden layer and an output layer, each layer is composed of a plurality of neurons, the input layer is 27 feature variables of processing attributes of user teaching materials, the number of the hidden layers is 9, the output layer is 1 value tag, and regression of the value tags is achieved through full connection of the neurons.
Preferably, the S4 includes the steps of:
dividing sample data formed by the teaching material processing capacity matrix set and the capacity label set into k groups, extracting 1 group of teachers from the k groups of sample data as a test set each time, taking the remaining k-1 groups of teachers as a training set, and gradually training the regression model for k times;
the trained evaluation effect value is the mean absolute percentage error value of the regression model, and is marked as MAPE, and the calculation mode is as follows:wherein M is the number of users, y ', corresponding to the test set sample'jIs a predicted value of the teacher's j ability label, yjThe actual value of the teacher j capability label;
and comparing the evaluation effects of different regression models, and determining the regression model with the minimum MAPE value as the optimal regression model.
Preferably, the method further comprises step S6:
collecting user update data at time t;
and dynamically updating the capability labels of the users based on the user updating data and the trained optimal regression model.
According to a second aspect of the present invention, there is provided a system for automatically identifying processing capability of a user teaching material based on multi-source data fusion, which is applied to a teaching platform supporting processing or management of the teaching material, and comprises:
the pre-defining module is used for pre-defining attributes of processing capacity of the user teaching materials and characteristic variables contained in each attribute;
the data acquisition module is used for acquiring user data from the teaching platform, performing multi-source data fusion from behaviors, contents and social dimensions according to the user data to determine the value of the variable of the attribute of the teaching material processing capacity of each user, and forming a unitary array consisting of the values of all characteristic variables of all attributes of the teaching material processing capacity of each user into a teaching material processing capacity matrix of each user;
the sample acquisition module is used for setting a screening condition to select a user set, acquiring a teaching material processing capacity matrix set corresponding to the user set and also acquiring a manually marked capacity label set of the user set;
the training module is used for constructing a regression model based on multiple types, the regression model is used for outputting recognized capability labels according to an input teaching material processing capability matrix, the regression model is trained by utilizing the teaching material processing capability matrix set and the capability label set, and an optimal regression model is determined;
and the recognition module is used for dynamically recognizing the processing capacity of the user teaching materials by utilizing the trained optimal regression model.
In summary, the advantages and positive effects of the invention are:
(1) the intelligent automatic identification method has the advantages that the intelligent automatic identification of the processing and processing capacity of the user teaching materials can be realized by fully utilizing the multi-dimensional process data in the education platform, the intelligent automatic identification method has the characteristics of being more objective, more accurate and more continuous, more time is spent only when the model is trained in advance, and the intelligent automatic identification method has the characteristics of high speed and high efficiency when the trained model is applied for identification.
(2) In addition, a time dimension is introduced, an automatic dynamic updating and identifying mode is supported, and large-scale and continuous evaluation work such as teaching material processing and processing capacity of a user, teacher information literacy and the like is facilitated.
(3) The accuracy of intelligent automatic identification can be further improved by optimizing the capability attribute/characteristic variable and the type of the collected data.
Drawings
Fig. 1 is a general flowchart of a dynamic evaluation method for processing and processing capabilities of user teaching materials based on multi-source data fusion according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention. In addition, the technical features involved in the embodiments of the present invention described below may be combined with each other as long as they do not conflict with each other.
Fig. 1 is a general flowchart of a method for automatically identifying processing capability of a user teaching material based on multi-source data fusion according to an embodiment of the present invention, where the method is applied to a teaching platform supporting processing or management of teaching materials, and the method includes the following steps:
and S1, predefining the attributes of the processing capacity of the user teaching materials and the characteristic variables contained in each attribute.
The attributes of the processing capacity of the teaching materials comprise richness, diversity, usability, usefulness and timeliness;
the richness is used for expressing quantity distribution characteristics of teaching materials in different file formats;
the diversity is used for representing the distribution characteristics of the purpose and the processing type of the teaching materials;
the usability is used for representing the use characteristics of the uploader of the teaching materials on the teaching materials;
the usefulness is used for representing the recognition characteristics of the teaching materials by other people except the uploader of the teaching materials;
the timeliness is used for representing the fluctuation characteristics of the updating frequency of the teaching materials;
the richness comprises 4 feature variables of picture richness R _ picture, audio richness R _ audio, video richness R _ video and animation richness R _ animation;
the picture richness R _ picture refers to a log function standardized numerical value of the number N _ picture of picture teaching materials uploaded by a user, and a picture richness calculation formula for any user i is as follows: r _ picturei=log10(N_picturei);
The audio richness R _ audio refers to log function of the quantity N _ audio of audio teaching materials uploaded by a userThe number is a normalized number, and the audio richness calculation formula for any user i is: r _ audioi=log10(N_audioi);
The video richness R _ video refers to a log function standardized numerical value of the number N _ video of video teaching materials uploaded by a user, and a video richness calculation formula for any user i is as follows: r _ videoi=log10(N_videoi);
The animation richness R _ animation refers to a log function standardized numerical value of the number N _ animation of animation teaching materials uploaded by a user, and the animation richness calculation formula for any user i is as follows: r _ animationi=log10(N_animationi);
The diversity comprises 3 characteristic variables of use diversity D _ use, processing type diversity D _ process and theme diversity D _ topic;
the application diversity D _ use refers to the proportion of the application number N _ use of the teaching materials uploaded by the user to the total application number Num _ of _ use of the teaching materials, and the application diversity calculation formula for any user i is as follows:
the processing type diversity D _ process refers to the proportion of the processing type quantity N _ process of the teaching material uploaded by the user to the total processing form Num _ of _ process of the teaching material, and the processing type diversity calculation formula for any user i is as follows:
the theme diversity refers to the proportion of the number N _ topic of the topics uploaded by the user to the total number Num _ of _ topics of the teaching material topics, and the calculation formula for comparing the theme diversity of the teaching material of any user i is as follows:
the availability comprises 5 characteristic variables of average usage U _ average, maximum usage U _ max, self-usage total U _ self, student usage total U _ student and usage mode U _ pattern;
the average usage U _ average refers to the ratio of the sum of the usage U _ each of all teaching materials uploaded by a user to the total number N _ all of the teaching materials uploaded by the user, and the calculation formula of the average usage of the teaching materials of any user i is as follows:
the maximum usage U _ max refers to a log function standardized value of the maximum value of the usage U _ each of each teaching material uploaded by a user, and a calculation formula of the maximum usage of the teaching material of any user i is as follows:
the self-use total amount U _ teacher refers to a log function standardized numerical value of the sum of the self-use amount U _ teach of each teaching material uploaded by a user, and a calculation formula of the self-use total amount of the teaching materials of any user i is as follows:
the total student usage amount U _ student refers to a log function standardized numerical value of the sum of the usage amounts U _ search of all teaching materials uploaded by a user, and the total student usage amount calculation formula for the teaching materials of any user i is as follows:
the use mode U _ pattern is a result of clustering use modes of teaching materials uploaded by all users of the teaching platform based on k-means;
the usefulness includes 13 characteristic variables of average share quantity Q _ share, average spread quantity Q _ divide, spread rate Q _ divide _ rate, average collection quantity Q _ collect, maximum collection quantity Q _ mcollect, average download quantity Q _ download, maximum download quantity Q _ mdowload, acceptance rate Q _ recognition, average score Q _ score, used centrality Q _ udegree, used category Q _ utype, comment sentiment tendency Q _ emotion and comment centrality Q _ cdegree;
the average sharing quantity Q _ share refers to the ratio of the sum of the sharing quantity Q _ share _ reach of each teaching material uploaded by a user and the total quantity N _ all of the teaching materials uploaded by the user, and the calculation formula of the average sharing quantity of the teaching materials of any user i is as follows:
the average propagation quantity Q _ share refers to the ratio of the sum of the browsed quantity Q _ difference _ each of all teaching materials uploaded by the user through the sharing link to the total quantity N _ all of the teaching materials uploaded by the user, and the calculation formula of the average propagation quantity of the teaching materials of any user i is as follows:
the propagation rate Q _ difference _ rate refers to a ratio of an average propagation amount Q _ difference of the teaching material uploaded by the user to an average sharing amount Q _ share of the teaching material uploaded by the user, and a calculation formula of the propagation rate of the teaching material of any user i is as follows:
the average collection quantity Q _ collect refers to the ratio of the sum of the collected quantities Q _ collect _ each of the teaching materials uploaded by the user to the total quantity N _ all of the teaching materials uploaded by the user, and the calculation formula of the average collection quantity of the teaching materials of any user i is as follows:
the maximum collection quantity Q _ mcollect refers to a log function standardized numerical value of the maximum value of the collection quantity Q _ collectionLeach of each teaching material uploaded by a user, and the calculation formula of the maximum collection quantity of the teaching materials of any user i is as follows:
the average download quantity Q _ download refers to the ratio of the sum of the downloaded quantity Q _ download _ each of all teaching materials uploaded by a user to the total quantity N _ all of the teaching materials uploaded by the user, and the calculation formula of the average download quantity of the teaching materials of any user i is as follows:
the maximum download quantity Q _ mdownload refers to a log function standardized numerical value of the maximum value of the download quantity Q _ download _ each of each teaching material uploaded by a user, and the maximum download quantity calculation formula of the teaching material of any user i is as follows:
the acceptance rate Q _ recognition refers to the ratio of the sum of the collected quantity Q _ collect _ reach and the downloaded quantity Q _ download _ reach of each teaching material uploaded by a user to the browsed quantity Q _ browse _ reach of each teaching material, and the calculation formula of the acceptance rate of the teaching materials for any user i is as follows:
the average score Q _ score refers to the ratio of the sum of all material scores Q _ score _ each uploaded by a user to the total number N _ all uploaded by the user, and the calculation formula of the average score of the teaching materials of any user i is as follows:
the used centrality Q _ udegree refers to the ratio of the total number U _ use of users who upload all teaching materials and are used by others to the number U of the users minus one, and the used centrality calculation formula for any user i is as follows:
the used class Q _ utype is a result of clustering used modes of teaching materials uploaded by all users of the teaching platform based on k-means;
the comment emotional tendency Q _ observation refers to the ratio of the sum of forward emotional comments Q _ observation _ each of all materials uploaded by the user to the total number N _ all of the materials uploaded by the user, and the average scoring calculation formula for any user i is as follows:
the comment centrality Q _ cdegree refers to the ratio of the total number U _ comment of users who upload all teaching materials and are commented by others to the number U of users minus one, and a comment centrality calculation formula for any user i is as follows:
the balance comprises an updating frequency T _ fre and fluctuation T _ vol2 characteristic variables;
the updating frequency T _ fre refers to the average times of uploading the teaching materials N _ time in each time period T by the user in the time period T, and the calculating formula of the updating frequency of the teaching materials of any user i is as follows:
the volatility T _ vol refers to the ratio of the teaching material N _ time uploaded by the user in each time period T in the time period T and the reference percentage B of the teaching material in the time TtThe calculation formula of the fluctuation of the teaching material of any user i is as follows:
in the embodiment of the invention, T is 12, and the standard percentile of the provided teaching materials in 12 months is as follows: b ═ 8%, 10%, 8%, 8%, 8%, 8%, 8%, 10%, 8%, 8%, 8% };
and S2, collecting user data from the teaching platform, performing multi-source data fusion according to the analysis method of the user data based on behaviors, contents and social dimensions to determine the characteristic variable value of the attribute of the teaching material processing capacity of each user, wherein the unitary array formed by the values of all the characteristic variables of all the attributes of the teaching material processing capacity of each user forms the teaching material processing capacity matrix of each user.
The behavior dimension analysis-based method comprises descriptive statistical analysis and K-means cluster analysis and is mainly used for calculating picture richness, audio richness, video richness, animation richness, use diversity, processing type diversity, average usage amount, maximum usage amount, self-usage total amount, student usage total amount, usage pattern, average sharing amount, average transmission amount, transmission rate, average collection amount, maximum collection amount, average download amount, maximum download amount, recognition rate, average score, used category, updating frequency and volatility characteristic variables.
The content-based dimension analysis method comprises multi-dimensional scale analysis and emotional tendency analysis, and is mainly used for calculating the diversity of themes and the characteristic variables of comment emotional tendency;
the social dimension analysis-based method comprises social network analysis and is mainly used for calculating central characteristic variables of used comments. The teaching platform comprises education application support platforms such as a regional education resource public service platform, an online teaching platform, a network research and repair platform, an online training platform and an education management platform; the embodiment of the invention adopts the Z-province education resource public service platform network learning space, and the data acquisition time is 2019, 08 months and 30 days;
the user data comprises user basic data, teaching material label data, teaching material using behavior data, teaching material grading behavior data and teaching material comment behavior data;
the user basic data comprises a user id, a user name, a user role, a user gender, a user age, a located area, a school type, a section to be taught and a subject to be taught, and can be represented by U (U _ id, U _ name, U _ type, U _ generator, U _ age, U _ area, U _ school, U _ section and U _ subject);
the value range of the user role U _ type comprises any teachers, students and others, and can be expressed as follows: u _ type ═ { u _ teacher, u _ student, u _ other };
the value range of the user gender U _ gender is {0, 1}, wherein 0 represents a female and 1 represents a male;
the value range of the school category U _ school includes a city, a county and a town, and can be represented as: u _ school ═ { u _ city, u _ town, u _ count };
the value range of the section U _ section to be taught comprises primary school, junior high school and junior high school, and can be expressed as U _ section { U _ primary, U _ junior, U _ high, 0}, wherein the section taught by the user of the student and other roles can only take the value of 0;
the value range of the taught subject U _ subject comprises Chinese, mathematics, English, physics, chemistry, biology, history, politics, geography, society, science, sports, music, art, health, legal, information technology, comprehensive practice and nothing, and can be expressed as U _ subject { U _ Chinese, U _ math, U _ English, U _ physics, U _ chemistry, U _ biology, U _ history, U _ polarity, U _ geometry, U _ society, U _ science, U _ sports, U _ labor, U _ information technology, U _ comprehensive action, 0}, wherein the user taught by the taught subject and other users can only take the value of 0;
table 1 shows a partial example of a basic data collection result of a user whose user role is an arbitrary teacher, where the total number of users whose user roles are arbitrary teachers is 10625;
TABLE 1 example of (part of) user basic data Collection for a user whose user role is an instructor
Wherein, the teacher 1 is a plum teacher, which is a 34-year-old female math teacher in primary school in a certain city of S city, Z province;
the teacher 2 is a teacher, and is a male biological teacher aged 33 in primary middle school in a certain town of S, Z province;
the teacher 10625 is a 45-year-old male scientific teacher from Zhao teacher, D city, Z province;
the teaching material basic data comprises teaching material id, teaching material name, material form, material use and processing type, and can be represented by M ═ M _ id, M _ name, M _ format, M _ use and M _ type;
the value range of the teaching material form M _ format comprises pictures, audio, video and animation, and can be expressed as follows: m _ format { m _ picture, m _ audio, m _ video, m _ animation };
the value range of the teaching material application M _ use comprises the use of pre-class pre-study, the use of in-class pre-study and the use of post-class review, and can be expressed as follows: m _ use, m _ before, m _ in, m _ after, wherein the total number of usage of the teaching material Num _ of _ use is 3;
the value range of the teaching material processing type M _ comprises conversion, beautification, selection and integration, and can be expressed as follows: m _ type ═ { m _ convert, m _ embellish, m _ excerpt, m _ integration }, wherein the total number Num _ of _ processes of teaching material processing types is 4;
table 2 is a partial example of the basic data acquisition results of teaching materials provided in the embodiment of the present invention, where the total number of the teaching materials is 95348;
TABLE 2 example of (partial) acquisition results of basic data of teaching materials
M_id | M_name | M_format | M_use | M_type |
1 | Small | m_picture | m_before | m_convert |
2 | I | m_picture | m_in | m_integration |
... | ... | ... | ... | ... |
95348 | Lesson | m_video | m_in | m_excerpt |
Wherein, the teaching material with M _ id of 1 is a picture teaching material which is used for pre-lesson pre-study after conversion processing;
the teaching material with M _ id of 2 is a picture teaching material used in a classroom after integration processing;
the teaching material with the M _ id of 95348 is a video teaching material used in a classroom after beautification processing;
the teaching material label data comprises a teaching material id, a label name and a label weight, and can be represented by L (M _ id, L _ name and L _ weight);
the label weight L _ weight represents the number of times of the label, and the value range is [0, + ∞ ]; TABLE 2 teaching materials Label data acquisition results (partial) example
Table 3 is a partial example of the data collection result of the teaching material labels provided in the embodiment of the present invention, where the total number of teaching labels is 543325;
table 3 partial examples of teaching material label data collection results
M_id | L_name | L_weight |
1 | Geometry | 5 |
2 | Circular shape | 2 |
... | ... | ... |
95348 | Mathematics, and | 10 |
wherein, the teaching material with M _ id of 1 is marked as geometric 5 times and circular 2 times;
the teaching material with M _ id of 95348 is labeled as math 10 times;
the user uploads, browses, collects, downloads, uses, shares the procedural use behavior data such as teaching materials, and the like, and the procedural use behavior data comprises a use behavior id, a user id, a use behavior action, a teaching material, a behavior Time, and a behavior source, and can be represented by B ═ (B _ id, U, B _ action, M, Time, B _ source);
the value range of the usage behavior action B _ action includes uploading, browsing, collecting, downloading, using and sharing, and can be represented as: b _ action ═ b _ upload, b _ browse, b _ collect, b _ download, b _ use, b _ share };
the value range of the behavior source B _ source includes search, share, and others, and can be represented as: b _ source ═ { b _ searched, b _ shared, b _ other };
table 4 is a partial example of the collection results of the procedural usage behavior data, such as the user uploading, browsing, collecting, downloading, using, and sharing the teaching material, provided by the embodiment of the present invention, where the usage behavior data is 406576 bars in total;
TABLE 4 teaching materials usage behavior data acquisition results (partial) example
The use behavior with B _ id of 1 is that a user with U _ id of 1 browses teaching materials with M _ id of 198 at 7 points, 7 minutes and 3 seconds at 1 day of 9 months in 2018 in a searching mode;
the use behavior with B _ id of 2 is that the user with U _ id of 1 uses the teaching material with M _ id of 198 at 7 points 8 minutes and 21 seconds on 1 day of 9 months in 2018;
the use behavior of B _ id 406576 is that a user with U _ id 269 browses a teaching material with M _ id 1376 at 23 o 0 min 13 sec in 8, 30 and 2019;
the scoring behavior data of the teaching material comprises scoring behavior id, users, teaching materials, scoring score and behavior Time, and can be represented by S ═ S _ id, U, M, S _ score, and Time;
the value range of the score index is [0,5 ];
table 5 is a partial example of a result obtained by collecting scoring behavior data of a teaching material by a user according to an embodiment of the present invention, where the scoring behavior data includes 107613 bars;
TABLE 5 teaching materials Scoring behavior data acquisition results (partial) example
The scoring behavior with the S _ id of 1 is that the teaching material with the M _ id of 18 is scored as 2 by 22 minutes and 20 seconds when the user with the U _ id of 1 is 21 days in 9 months and 3 days in 2018;
the scoring behavior with the S _ id of 2 is that the teaching material with the M _ id of 1958 is scored as 5 by 7 minutes and 23 seconds when the user with the U _ id of 1 is in 2018, 9, 18 and 14 days;
the scoring behavior of 107613 for the S _ id is that the teaching material with 18723 for the M _ id is scored as 4.5 for 58 minutes and 14 seconds when 23 days 8 and 23 months in 2019 for the user with 2239 for U _ id;
the teaching material comment behavior data comprise evaluation behavior id, users, teaching materials, comment contents and behavior Time, and can be represented by C-id (C _ id, U, M, C _ comment, Time);
table 6 is a partial example of a result of collecting review behavior data of a user on a teaching material according to the embodiment of the present invention, where the total number of scoring behavior data is 252123;
TABLE 6 example of acquisition results (part of) of behavior data of review of teaching materials
C_id | U | M | C_comment | Time |
1 | U_id=1765 | M_id=1 | Is very helpful | 2018-09-01 14:12:25 |
2 | U_id=8872 | M_id=1 | Is not clear | 2018-09-02 12:17:03 |
... | ... | ... | ... | ... |
252123 | U_id=22 | M_id=91121 | Support for | 2019-07-31 14:38:04 |
The comment behavior with C _ id of 1 is that a user with U _ id of 1765 is very helpful to comment the teaching material with M _ id of 1 in 2018, 9, 1, 14, 12 minutes and 20 seconds;
the comment behavior with the C _ id of 2 is that the user with the U _ id of 8872 comments about the teaching material with the M _ id of 1 in 17 minutes and 3 seconds in 9, 2 and 12 in 2018 and not very clearly;
the comment behavior with the C _ id of 252123 is that the user with the U _ id of 22 supports the comment of the teaching material with the M _ id of 91121 in 2019, 7, 31, 14 and 38 minutes and 4 seconds;
the intermediate variables related to the values of the characteristic variables of the attributes of the teaching material processing capacity of each user comprise the number Num _ of _ topic of subjects of the teaching platform, the using mode U _ pattern of teaching resources and the used type Q _ utype of the teaching resources, and the total number N _ all of the teaching materials uploaded by the user iiAnd the number of picture teaching materials N _ pictureiAnd the number of audio teaching materials N _ audioiAnd the number of video teaching materials N _ videoiAnd the number N _ animation of animation teaching materialsiAnd the number of applications of the teaching material N _ useiAnd the number of processing types N _ Process of the teaching materialiAnd the number N _ topic of the subjects of the teaching materialiAnd the total number of the users using the teaching materials by others is U _ useiAnd the total number of the users with the commented teaching materials U _ commentiAnd the user i uploads the usage U _ each of the teaching material ni,nUsed amount of U _ teach by oneselfi,nThe usage amount of the student U _ seachi,nThe shared quantity Q _ share _ reachi,nAnd the browsed quantity Q _ difference _ reach is shared by the shared linki,nAnd the browsed amount Q _ browse _ reachi,nAnd the collected quantity Q _ collect _ eachi,nAnd the downloaded amount Q _ download _ eachi,nScore Q _ score _ eachi,nAnd forward emotion comment Q _ observation _ eachi,nAnd uploading the teaching material N _ time to the user i in each time period T in the time period Ti,t;
The number Num _ of _ topic of the theme of the teaching platform is obtained by multi-dimensional scale analysis of a teaching material label network, and the value is 20 in the embodiment of the invention;
the teaching material label network is a undirected network and can be represented by Gl ═ L, El, wherein L represents all labels, and El represents the collinear relationship among the labels;
the teaching resource usage pattern U _ pattern is a K-means clustering result based on the average usage amount, the maximum usage amount, the total sub-usage amount and the total student usage amount of the user, wherein the selected K value is 4;
the used type Q _ utype of the teaching resource is the result of K-means clustering based on the average sharing amount, the average transmission amount, the average collection amount, the maximum collection amount, the average downloading amount and the maximum downloading amount of a user, wherein the selected K value is 4;
the total number N _ all of the teaching materials uploaded by the user iiThe calculation formula of (2) is as follows: n _ alli=|{B|B_action=b_upload,U=i}|;
The number N _ picture of the picture teaching materials uploaded by the user iiThe calculation formula of (2) is as follows: n _ picturei=|{B|B_action=b_upload,U=i,M_format=m_picture}|;
The number N _ audio of audio teaching materials uploaded by the user iiThe calculation formula of (2) is as follows: n _ audioi=|{B|B_action=b_upload,U=i,M_format=m_audio}|;
The number N _ video of video teaching materials uploaded by the user iiThe calculation formula of (2) is as follows: n _ videoi=|{B|B_action=b_upload,U=i,M_format=m_video}|;
The number N _ animation of animation teaching materials uploaded by the user iiThe calculation formula of (2) is as follows: n _ animationi=|{B|B_action=b_upload,U=i,M_format=m_animation}|;
The number of the applications N _ use of the user i uploading the teaching materialsiThe calculation formula of (2) is as follows: n _ usei=|{M_use|B_action=b_upload,U=i}|;
And the number N _ process of the processing types of the teaching materials uploaded by the user iiIs calculated byThe formula is as follows: n _ Processi=|{M_process|B_action=b_upload,U=i}|;
The number N _ topic of the topics of the teaching materials uploaded by the user iiIs determined according to the number of the tag topics belonging to the determined platform topics;
and the user i uploads the total number of the users using the teaching materials by others, namely U _ useiThe method is obtained through the relative income centrality of the teaching platform user using the network;
the users use a directed network, and can be represented by Gu ═ U, Eu, where U represents all users, and Eu represents that user i uses the teaching resources of user j;
the total number of users for uploading the commented teaching materials U _ comment by the user iiThe method is obtained by commenting the relative income centrality of the network on a teaching platform user;
the user comment network is a directed network and can be represented by Gc (U, Ec), wherein U represents all users, and Ec represents that user i has commented on teaching resources of user j;
the user i uploads the usage U _ each of the teaching material ni,nThe calculation formula of (2) is as follows: u _ eachi,nB _ use, M _ n, where n is B _ upload, U is i;
the user i uploads the used amount U _ teach of the teaching material ni,nThe calculation formula of (2) is as follows: u _ teachi,nB _ use, M n, U i, where n is M | B _ action B _ upload, U i;
user i upload teaching material n used by student U _ searchi,nThe calculation formula of (2) is as follows: u _ seachi,nB _ use, M n, U _ type, U _ student, n, M | B _ action B _ upload, U |;
the user i uploads the shared quantity Q _ share _ reach of the teaching material ni,nThe calculation formula of (2) is as follows: q _ share _ eachi,nB _ share, M ═ n } |, where n ═ M | B _ action ═ B _ upload, U ═ i };
the user i uploads the sharing chain of the teaching material nReceiving the browsed quantity Q _ difference _ eachi,nThe calculation formula of (2) is as follows: q _ difference _ eachi,nB _ use, B _ source B _ shared, M n, where n is B _ upload, U is i;
and the user i uploads the browsed quantity Q _ browse _ reach of the teaching material ni,nThe calculation formula of (2) is as follows: q _ brown _ reachi,nB _ browse, M ═ n } |, where n ═ M | B _ action ═ B _ upload, U ═ i };
the user i uploads the collected quantity Q _ collect _ each of the teaching material ni,nThe calculation formula of (2) is as follows: q _ collect _ eachi,nB _ collection, M ═ n } |, where n ═ M | B _ action ═ B _ upload, U ═ i };
the user i uploads the downloaded quantity Q _ download _ each of the teaching material ni,nThe calculation formula of (2) is as follows: q _ download _ eachi,nB _ download, M ═ n } |, where n ═ M | B _ action ═ B _ upload, U ═ i };
the user i uploads the score Q _ score _ each of the teaching material ni,nThe calculation formula of (2) is as follows: q _ score _ eachi,nB _ score, M n, where n is B _ action, U is i;
and the user i uploads the comment emotional tendency Q _ observation _ each of the teaching material ni,nAnalyzing and acquiring emotional tendency in natural language processing, and when the comment emotional tendency is analyzed to be positive, expressing that the comment emotional tendency belongs to positive emotion and counting as 1;
and the user i uploads a teaching material N _ time in each time period T in the time period Ti,tThe calculation formula of (2) is as follows: n _ timei,t=|{B|B_action=b_upload,B_time∈t,U=i}|;
Tables 7 and 8 are examples of values of processing and processing capability attributes and attribute characteristic variables of the user teaching materials provided by the embodiment of the present invention, where table 7 is an overall value uploaded by the user to the teaching materials, and table 8 is a specific value uploaded by the user to each teaching material;
TABLE 7 integral value (partial) example for user uploading teaching material
For user 1, the total number of uploaded teaching materials is 139, the number of picture teaching materials is 110, the number of audio teaching materials is 4, the number of video teaching materials is 16, animation teaching materials are not provided, the number of purposes of teaching materials is 2, the number of processing types of teaching materials is 2, the number of subjects of teaching materials is 1, teaching materials are used by 20 other users and are commented by 12 users, and the uploaded teaching materials in each month within 12 months are respectively 10,46,0,0,1,0,14,42,22,4,0 and 0;
table 8 example of specific values (parts) of each teaching material uploaded by user
For the user 1, the usage amount of the uploaded teaching material 1 is 3, the usage amount by the user is 3, the usage amount by students is 0, the shared amount is 3, the browsed amount is not recorded through the sharing link, the browsed amount is not recorded, the collected amount is 1, the downloaded amount is 30, the score is 4, and the number of forward emotion comments is 5; the usage amount of the uploaded teaching materials 139 is 100, the usage amount by the students is 2, the usage amount by the students is 80, the shared amount is 1, the browsed amount is not recorded through the sharing link, the browsed amount is not recorded, the collected amount is 0, the downloaded amount is 63, the score is 4, and the number of forward emotion comments is 2;
forming a teaching material processing capacity matrix X of each user iiI.e. Xi=(R_picturei,R_audioi,R_videoi,D_usei,D_processi,D_topici,U_averagei,U_maxi,U_selfi,U_studenti,U_parrteni,Q_sharei,Q_collecti,Q_mcollecti,Q_downloadi,Q_mdownloadi,Q_scorei,Q_udegreei,Q_utypei,Q_emotioni,Q_cdegreei,T_frei,T_voli};
Taking teacher 1 as an example, a teaching material processing and processing capability evaluation matrix X of teacher 1 is described1Taking the value of (A);
the picture richness value of the teacher 1 is as follows: r _ picture1=log10(N_picture1)=log10(119)=2.08;
The audio richness value of teacher 1 is: r _ audio1=log10(N_audio1)=log10(4)=0.60;
The video richness value of teacher 1 is: r _ video1=log10(N_video1)=log10(16)=1.20;
the using mode of the teacher 1 belongs to the first class according to the clustering result;
the used centrality value of the teacher 1 is 0.035;
the used category of teacher 1 belongs to the third category according to the clustering result;
the comment centrality value of teacher 1 is 0.003;
multidimensional input matrix X of teacher 1 in the embodiment of the invention1=(2.08,0.60,1.20,0.67,0.50,0.05,1.93,2.00,1.30,2.33,1,0.07,0.06,0.00,8.94,1.80,3.98,0.035,3,0.51,0.003,11.58,0.14);
Step S3, selecting a user set, acquiring a teaching material processing capability matrix set corresponding to the user set, and also acquiring a manually labeled capability label set of the user set.
Step S31, selecting the user set according to the area where the user is located, the school type, the section being taught and the subject dimension being taught, recording the user set as U _ teacher, recording the number of the user set as NU, and acquiring a teaching material processing capacity matrix X corresponding to each user in the user setiForming a corresponding teaching material processing capability matrix set, which is marked as X, X ═ X1,X2,...,Xi,...,XNU)TWherein X isi∈U_teacher;
The specific user set U _ teacher constructed in the embodiment of the invention is the science users of all areas, all school types and all school paragraphs in Z province, and the number NU is 5023;
in the embodiment of the invention, the teaching material processing and processing capability evaluation input data set X of the user set U _ teacheru_teacherA matrix of 5023 science users for all regions, all school types, all school segments of Z provinceXiOf a synthetic matrix, i.e.
Step S32, acquiring the capability label set of the artificially labeled user set U _ teacher, and recording as Yu_teacher,Yu_teacher=(Y1,Y2,...,Yi,...,YNU)TWherein Y isiCapability tag for each user, Yi∈U_teacher;
Capability label YiIs determined according to the user self-labeling data and the expert labeling data, firstly, the user self-labeling data St is calculatediAnd first expert annotation data SeiError value e ofi=|Sti-SeiIf eiLess than a predetermined threshold E, a capability label YiDetermined by averaging the two, if eiIf the value is larger than the set critical value E, the second expert marking data Sa is obtainediSeparately calculate SaiTo Sti、SeiDistance, capability label YiFrom SaiAnd the average value of the scores with smaller distance to the average value is determined, and the calculation formula is as follows:
table 9 is an example of a user material processing and handling capability value label (section) in an example of the present invention. Wherein the critical value E is 20.
TABLE 9 user materials processing and handling capability value tag (partial) examples
Self-evaluation score St of teacher 11100, expertEvaluation score Se1Teacher 1 material processing and processing final score Y of 1001100 ═ 100+100)/2 ═ 100; self-evaluation score St of teacher 2275, expert evaluation score Se2=100,e2=|St2-Se2|=|75-100|=25>E20, the evaluation score St of expert 2280 while |80-100>80-75, so that the teacher 2 processes and processes the material to obtain a final score Y2(80+ 75)/2: 77.5; self-evaluation score St of teacher 33Expert evaluation score Se of 1003Teacher 3 material processing and processing final score Y of 903(100+90)/2 ═ 95; end user material processing and handling capability score matrix Y ═ 100, 77.5., 95)T;;
And step S4, constructing a regression model based on various machine learning methods, wherein the regression model is used for outputting recognized capability labels according to an input teaching material processing capability matrix, and training the regression model by using the teaching material processing capability matrix set and the capability label set to determine an optimal regression model.
The multiple regression models comprise a multiple linear regression model, a random forest regression model, a support vector machine regression model and a BP neural network regression model;
the multiple linear regression model is a linear regression model fitted by minimizing the sum of the squares of the residuals between the value labels of the sample users and the predicted values of the linear model, and the calculation formula of the value labels is as follows:
wherein Y is a value tag, C is a constant, R _ picture is a picture richness variable, R _ audio is an audio richness characteristic variable, R _ video is a video richness characteristic variable, R _ animation is an animation richness characteristic variable, D _ use is a usage diversity characteristic variable, D _ process is a processing type diversity characteristic variable, D _ topic is a theme diversity characteristic variable, U _ average is an average usage characteristic variable, and U _ max is a maximum usage characteristic variableU _ self is an independent use total amount characteristic variable, U _ student is a student use total amount characteristic variable, U _ Pattern is a use mode characteristic variable, Q _ share is an average share characteristic variable, Q _ dispersion is an average spread characteristic variable, Q _ dispersion _ rate is a spread characteristic variable, Q _ collect is an average collect characteristic variable, Q _ mcollect is a maximum collect characteristic variable, Q _ download is an average download characteristic variable, Q _ download is a maximum download characteristic variable, Q _ recognition is an acceptance characteristic variable, Q _ score is an average score characteristic variable, Q _ udegregore is a used centrality characteristic variable, Q _ ute is a used category characteristic variable, Q _ emolumpy is a comment sentiment characteristic variable, Q _ cdgrere is a comment centrality characteristic variable, T _ fre is an update frequency characteristic variable, and T _ percent is a volatility characteristic variable,and ω1~ω26Epsilon is the error for the weight coefficient obtained by training;
the random forest regression model is an algorithm model using a CART decision tree as a weak learner and randomly selecting features, T weak learners are independently trained through T-time acquisition, and the final result is obtained by calculating the regression results of the T weak learners by adopting a weighted average method;
the support vector machine regression model is used for mapping an input teaching material processing capacity matrix into a high-dimensional feature space through a kernel function to realize regression calculation of a value label, and the calculation formula of the value label is as follows:
wherein Y is a value tag, wherein,and alphaiIs Lagrange coefficient, x is the characteristic variable of the processing attribute of the input user teaching material,is a characteristic variable xiIn the transposed form of (a) to (b),is a kernel function, satisfiesb is a constant;
the BP neural network regression model is a three-layer neural network with an input layer, a hidden layer and an output layer, each layer is composed of a plurality of neurons, the input layer is 27 feature variables of processing attributes of user teaching materials, the number of the hidden layers is 9, the number of the output layers is 1 value label, and regression of the value labels is realized through full connection of the neurons;
dividing sample data formed by the teaching material processing capacity matrix set and the capacity label set into k groups, extracting 1 group of teachers from the k groups of sample data as a test set each time, taking the remaining k-1 groups of teachers as a training set, and gradually training the regression model for k times, wherein k is 10 in the embodiment of the invention;
the trained evaluation effect value is the mean absolute percentage error value of the regression model, and is marked as MAPE, and the calculation mode is as follows:wherein M is the number of users, y ', corresponding to the test set sample'jIs a predicted value of the teacher's j ability label, yjThe actual value of the teacher j capability label;
and S5, comparing the evaluation effects of the different regression models, determining the regression model with the minimum MAPE value as the optimal regression model, and dynamically identifying the processing capacity of the user teaching material.
The average MAPE value of the four regression models in the embodiment of the invention is 10.76%, and the validity of the automatic identification regression model based on the processing and processing capability of the user teaching material of multi-source data fusion is determined on the whole, wherein the model is based on a multiple linear regression model L1The loss function MAPE value of (1) is only 5.29%, the final selection baseIn a multiple linear regression model L1The regression model is an optimal regression model, namely the final characteristic automatic identification regression model for the processing and processing capacity of the user teaching materials;
optimal regression model L1Wherein picture richness R _ picture, audio richness R _ audio, video richness R _ video, usage diversity D _ use, theme diversity D _ topic, processing type diversity D _ process, average usage U _ average, maximum usage U _ max, self usage U _ self, student usage U _ student, usage pattern U _ party, average share Q _ share, average collection Q _ collect, maximum collection Q _ mcollect, average download Q _ download, maximum download Q _ mdolload, average score Q _ score, central Q _ udegrede used, category Q _ issue used, emotion comment Q _ emotion, central Q _ cdree comment, update frequency T _ fre, volatility T _ vol used as an independent variable, user teaching material and processing ability are used as a gradual model for automatic identification of picture richness, and the residual value is analyzed by an automatic regression model, the use diversity, the average use amount, the average share amount, the average collection amount, the maximum download amount and the volatility are 7 items in total, the R-square value is 0.716, the picture richness, the use diversity, the average use amount, the average share amount, the average collection amount, the maximum download amount and the volatility can explain the 71.6% change reason of the final score. And the model passed the F test (F: 34.208, p: 0.000)<0.05), the model is valid. In addition, multiple collinearity of the model is checked, and the VIF values in the model are all smaller than 5, which means that the collinearity problem does not exist; and the value of D-W (D-W ═ 2.016) is around number 2, thus it is demonstrated that there is no autocorrelation in the model, there is no correlation between sample data, and the model is good. Table 10 shows a stepwise regression model L according to an embodiment of the present invention1The concrete result of (1).
TABLE 10 stepwise regression model L of the examples of the present invention1The concrete result of (1).
The final regression equation is: final score YU_teacher41.680+6.685 picture richness R _ picture +26.389 usage diversity D _ use +16.463 average usage U _ average-1.153 average share Q _ share +19.064 average collection Q _ gather +4.927 maximum download Q _ mdownload-35.233 volatility T _ vol;
taking a mathematics male teacher 10626 of a certain city and primary school in Z province, which is not a sample, as an example, the automatic test result of the processing and processing capability evaluation result of the teaching materials is explained;
picture richness R _ picture of teacher 10626106261.08, application diversity D _ use106261.00, average usage amount U _ average106260.54, average share Q _ share106263.93, average Collection Q _ Collection106260.00, maximum download Q _ mdownload10626Is 0.49, volatility T _ vol10626Is 0.31;
according to the capability evaluation model Lu_teacherThe teaching material processing and processing ability score of the user 10626 is automatically calculated to be 71.14.
And step S6, dynamically identifying the processing capacity of the user teaching materials by using the trained optimal regression model.
Collecting user update data at time t;
dynamically updating the capability label of the user based on the user updating data and the trained optimal regression model;
and (3) updating multi-source process data of the teaching material processing and processing capacity of the teacher 10626 by taking a half year as an updating period C, wherein the picture richness R _ picture of the teacher 10626 is obtained in 29 days 2 and 29 months 2020106261.28, diversity of uses D _ use106261.00, average usage amount U _ average106260.64, average share Q _ share106263.93, average Collection Q _ Collection106260.00, maximum download Q _ mdownload10626Is 0.49, volatility T _ vol106260.37, the final score is the picture richness R _ picture of teacher 10626 at 30 days 8/202010626Is 2.68, application diversity D _ use10626Is 1.00, average usage amount U _ average106260.81, average share Q _ share106263.23, average Collection Q _ Collection106260.01, maximum download Q _ mdowload10626Is 0.49, volatility T _ vol106260.38 with a final score of 84.81;
the embodiment of the invention provides a multi-source data fusion-based automatic identification system for processing and processing capacity of user teaching materials, which comprises the following steps:
the pre-defining module is used for pre-defining attributes of processing capacity of the user teaching materials and characteristic variables contained in each attribute;
the data acquisition module is used for acquiring user data from the teaching platform, performing multi-source data fusion from behaviors, contents and social dimensions according to the user data to determine the value of the variable of the attribute of the teaching material processing capacity of each user, and forming a unitary array consisting of the values of all characteristic variables of all attributes of the teaching material processing capacity of each user into a teaching material processing capacity matrix of each user;
the sample acquisition module is used for setting a screening condition to select a user set, acquiring a teaching material processing capacity matrix set corresponding to the user set and also acquiring a manually marked capacity label set of the user set;
the training module is used for constructing a regression model based on multiple types, the regression model is used for outputting recognized capability labels according to an input teaching material processing capability matrix, the regression model is trained by utilizing the teaching material processing capability matrix set and the capability label set, and an optimal regression model is determined;
and the recognition module is used for dynamically recognizing the processing capacity of the user teaching materials by utilizing the trained optimal regression model.
The implementation principle and technical effect of the system are similar to those of the method, and are not described herein again.
It must be noted that in any of the above embodiments, the methods are not necessarily executed in order of sequence number, and as long as it cannot be assumed from the execution logic that they are necessarily executed in a certain order, it means that they can be executed in any other possible order.
It will be understood by those skilled in the art that the foregoing is only a preferred embodiment of the present invention, and is not intended to limit the invention, and that any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the scope of the present invention.
Claims (10)
1. A method for automatically identifying processing capacity of a user teaching material based on multi-source data fusion is applied to a teaching platform supporting processing or management of the teaching material, and is characterized by comprising the following steps:
s1, predefining attributes of the processing capacity of the user teaching materials and characteristic variables contained in each attribute;
s2, collecting user data from the teaching platform, performing multi-source data fusion according to the user data by an analysis method based on behaviors, contents and social dimensions, and determining the characteristic variable values of the attributes of the teaching material processing capacity of each user, wherein the unitary array formed by the values of all the characteristic variables of all the attributes of the teaching material processing capacity of each user forms a teaching material processing capacity matrix of each user;
s3, selecting a user set, acquiring a teaching material processing capacity matrix set corresponding to the user set, and acquiring a manually labeled capacity label set of the user set;
s4, constructing multiple regression models based on multiple machine learning methods, wherein the regression models are used for outputting recognized capability labels according to input teaching material processing capability matrixes, training the regression models by using the teaching material processing capability matrix set and the capability label set, and determining an optimal regression model;
and S5, dynamically identifying the processing capacity of the user teaching materials by using the trained optimal regression model.
2. The method for automatically identifying the processing capability of the user teaching materials based on the multi-source data fusion as claimed in claim 1, wherein the attributes of the processing capability of the teaching materials include richness, diversity, usability, usefulness and timeliness;
the richness is used for expressing quantity distribution characteristics of teaching materials in different file formats;
the diversity is used for representing the distribution characteristics of the purpose and the processing type of the teaching materials;
the usability is used for representing the use characteristics of the uploader of the teaching materials on the teaching materials;
the usefulness is used for representing the recognition characteristics of the teaching materials by other people except the uploader of the teaching materials;
the timeliness is used for representing the fluctuation characteristics of the updating frequency of the teaching materials.
3. The method of claim 2, wherein the richness comprises 4 characteristic variables of picture richness, audio richness, video richness and animation richness;
the diversity comprises 3 characteristic variables of use diversity, processing type diversity and theme diversity;
the availability comprises 5 characteristic variables of average usage, maximum usage, total self-usage, total student usage and usage pattern;
the usefulness comprises 13 characteristic variables of average share quantity, average spread quantity, spread rate, average collection quantity, maximum collection quantity, average download quantity, maximum download quantity, recognition rate, average score, used centrality, used category, comment emotional tendency and comment centrality;
the timeliness comprises 2 characteristic variables of updating frequency and volatility.
4. The method for automatically identifying processing capability of user teaching materials based on multi-source data fusion as claimed in claim 1, wherein the user data comprises user basic data, teaching material label data, teaching material use behavior data, teaching material scoring behavior data and teaching material review behavior data;
the user basic data comprises a user id, a user name, a user role, a user gender, a user age, a located area, a school type, a section to be taught and a subject to be taught;
the teaching material basic data comprises a teaching material id, a teaching material name, a material form, a material purpose and a processing type;
the teaching material label data comprises a teaching material id, a label name and a label weight;
the teaching material use behavior data comprises use behavior id, users, use behavior actions, teaching materials, behavior time and behavior sources;
the teaching material grading behavior data comprise grading behavior id, users, teaching materials, grading score and behavior time;
the teaching material comment behavior data comprise evaluation behavior id, users, teaching materials, comment contents and behavior time.
5. The automatic identification method for processing and processing capacity of user teaching materials based on multi-source data fusion as claimed in claim 3, wherein the analysis method based on behavior dimension includes descriptive statistical analysis and K-means cluster analysis, and is mainly used for calculating picture richness, audio richness, video richness, animation richness, use diversity, processing type diversity, average usage amount, maximum usage amount, self-usage total amount, student usage total amount, usage pattern, average sharing amount, average transmission amount, transmission rate, average collection amount, maximum collection amount, average download amount, maximum download amount, acceptance rate, average score, used category, update frequency and volatility characteristic variables;
the content-based dimension analysis method comprises multi-dimensional scale analysis and emotional tendency analysis, and is mainly used for calculating the diversity of themes and the characteristic variables of comment emotional tendency;
the social dimension analysis-based method comprises social network analysis and is mainly used for calculating central characteristic variables of used comments.
6. The method for automatically identifying processing capability of user teaching materials based on multi-source data fusion as claimed in claim 1, wherein the step S3 includes the steps of:
s31, selecting the user set according to the area where the user is located, the school type, the section being taught and the subject dimension being taught, recording the user set as U _ teacher, recording the number of the user set as NU, and acquiring a teaching material processing capacity matrix X corresponding to each user in the user setiForming a corresponding teaching material processing capability matrix set, which is marked as X, X ═ X1,X2,...,Xi,...,XNU)TWherein X isi∈U_teacher;
S32, acquiring a manually labeled user set U _ teacher capability label set, and recording as Yu_teacher,Yu_teacher=(Y1,Y2,...,Yi,...,YNU)TWherein Y isiCapability tag for each user, Yi∈U_teacher;
Capability label YiIs determined according to the user self-labeling data and the expert labeling data, firstly, the user self-labeling data St is calculatediAnd first expert annotation data SeiError value e ofi=|Sti-SeiIf eiLess than a predetermined threshold E, a capability label YiDetermined by averaging the two, if eiIf the value is larger than the set critical value E, the second expert marking data Sa is obtainediSeparately calculate SaiTo Sti、SeiDistance, capability label YiFrom SaiAnd the average value of the scores with smaller distance to the average value is determined, and the calculation formula is as follows:
7. the method for automatically identifying processing capacity of user teaching materials based on multi-source data fusion as claimed in claim 1, wherein the multiple regression models comprise a multiple linear regression model, a random forest regression model, a support vector machine regression model and a BP neural network regression model;
the multiple linear regression model is a linear regression model fitted by minimizing the sum of squared residuals between the value labels of the sample users and the predicted values of the linear model, and the calculation formula of the value labels is as follows:
wherein Y is a value tag, C is a constant, R _ picture is a picture richness variable, R _ audio is an audio richness characteristic variable, R _ video is a video richness characteristic variable, R _ animation is an animation richness characteristic variable, D _ use is a use diversity characteristic variable, D _ process is a processing type diversity characteristic variable, D _ topic is a subject diversity characteristic variable, U _ average is an average use characteristic variable, U _ max is a maximum use characteristic variable, U _ self is an independent use total amount characteristic variable, U _ student use is a student use total amount characteristic variable, U _ Pattern is a use mode characteristic variable, Q _ share is an average share characteristic variable, Q _ difference is an average propagate characteristic variable, Q _ difference _ rate is a propagate characteristic variable, Q _ count is an average collection amount characteristic variable, Q _ mcort is a maximum collection amount characteristic variable, and Q _ download is an average download amount characteristic variable, q _ mdownload is a maximum download quantity characteristic variable, Q _ recognition is an acceptance rate characteristic variable, Q _ score is an average scoring characteristic variable, Q _ udegree is a used central characteristic variable, Q _ utype is a used category characteristic variable, Q _ emotion is a comment emotional tendency characteristic variable, Q _ cdegree is a comment central characteristic variable, T _ fre is an update frequency characteristic variable, T _ vol is a volatility characteristic variable,and ω1~ω26Epsilon is the error for the weight coefficient obtained by training;
the random forest regression model is an algorithm model using a CART decision tree as a weak learner and randomly selecting features, T weak learners are independently trained through T-time acquisition, and the final result is obtained by calculating the regression results of the T weak learners by adopting a weighted average method;
the support vector machine regression model is used for mapping an input teaching material processing capacity matrix into a high-dimensional feature space through a kernel function to realize regression calculation of a value label, and the calculation formula of the value label is as follows:
wherein Y is a value tag, wherein,and alphaiIs Lagrange coefficient, x is the characteristic variable of the processing attribute of the input user teaching material,is a characteristic variable xiIn the transposed form of (a) to (b),is a kernel function, satisfiesb is a constant;
the BP neural network regression model is a three-layer neural network with an input layer, a hidden layer and an output layer, each layer is composed of a plurality of neurons, the input layer is 27 feature variables of processing attributes of user teaching materials, the number of the hidden layers is 9, the output layer is 1 value tag, and regression of the value tags is achieved through full connection of the neurons.
8. The method for automatically identifying processing capability of user teaching materials based on multi-source data fusion as claimed in claim 1, wherein the step S4 includes the steps of:
dividing sample data formed by the teaching material processing capacity matrix set and the capacity label set into k groups, extracting 1 group of teachers from the k groups of sample data as a test set each time, taking the remaining k-1 groups of teachers as a training set, and gradually training the regression model for k times;
the trained evaluation effect value is the mean absolute percentage error value of the regression model, and is marked as MAPE, and the calculation mode is as follows:wherein M is the number of users, y ', corresponding to the test set sample'jIs a predicted value of the teacher's j ability label, yjThe actual value of the teacher j capability label;
and comparing the evaluation effects of different regression models, and determining the regression model with the minimum MAPE value as the optimal regression model.
9. The method for automatically identifying processing capability of user teaching materials based on multi-source data fusion as claimed in claim 1, further comprising step S6:
collecting user update data at time t;
and dynamically updating the capability labels of the users based on the user updating data and the trained optimal regression model.
10. The utility model provides a user teaching material processing ability automatic identification system based on multisource data fusion, is applied to the teaching platform who supports processing or management to teaching material, its characterized in that includes:
the pre-defining module is used for pre-defining attributes of processing capacity of the user teaching materials and characteristic variables contained in each attribute;
the data acquisition module is used for acquiring user data from the teaching platform, performing multi-source data fusion from behaviors, contents and social dimensions according to the user data to determine the value of the variable of the attribute of the teaching material processing capacity of each user, and forming a unitary array consisting of the values of all characteristic variables of all attributes of the teaching material processing capacity of each user into a teaching material processing capacity matrix of each user;
the sample acquisition module is used for setting a screening condition to select a user set, acquiring a teaching material processing capacity matrix set corresponding to the user set and also acquiring a manually marked capacity label set of the user set;
the training module is used for constructing a regression model based on multiple types, the regression model is used for outputting recognized capability labels according to an input teaching material processing capability matrix, the regression model is trained by utilizing the teaching material processing capability matrix set and the capability label set, and an optimal regression model is determined;
and the recognition module is used for dynamically recognizing the processing capacity of the user teaching materials by utilizing the trained optimal regression model.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011583583.7A CN112699933B (en) | 2020-12-28 | 2020-12-28 | Automatic identification method and system for processing capability of user teaching materials |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011583583.7A CN112699933B (en) | 2020-12-28 | 2020-12-28 | Automatic identification method and system for processing capability of user teaching materials |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112699933A true CN112699933A (en) | 2021-04-23 |
CN112699933B CN112699933B (en) | 2023-07-07 |
Family
ID=75513027
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011583583.7A Active CN112699933B (en) | 2020-12-28 | 2020-12-28 | Automatic identification method and system for processing capability of user teaching materials |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112699933B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116167669A (en) * | 2023-04-26 | 2023-05-26 | 国网浙江省电力有限公司金华供电公司 | Carbon emission assessment method based on power consumption regression |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20140272914A1 (en) * | 2013-03-15 | 2014-09-18 | William Marsh Rice University | Sparse Factor Analysis for Learning Analytics and Content Analytics |
CN108846530A (en) * | 2018-09-28 | 2018-11-20 | 国网上海市电力公司 | One kind being based on the short-term load forecasting method of " cluster-recurrence " model |
CN109191953A (en) * | 2018-11-12 | 2019-01-11 | 重庆靶向科技发展有限公司 | A kind of intelligentized system of teaching and learning and method |
US20190318407A1 (en) * | 2015-07-17 | 2019-10-17 | Devanathan GIRIDHARI | Method for product search using the user-weighted, attribute-based, sort-ordering and system thereof |
CN111275239A (en) * | 2019-12-20 | 2020-06-12 | 西安电子科技大学 | Multi-mode-based networked teaching data analysis method and system |
-
2020
- 2020-12-28 CN CN202011583583.7A patent/CN112699933B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20140272914A1 (en) * | 2013-03-15 | 2014-09-18 | William Marsh Rice University | Sparse Factor Analysis for Learning Analytics and Content Analytics |
US20190318407A1 (en) * | 2015-07-17 | 2019-10-17 | Devanathan GIRIDHARI | Method for product search using the user-weighted, attribute-based, sort-ordering and system thereof |
CN108846530A (en) * | 2018-09-28 | 2018-11-20 | 国网上海市电力公司 | One kind being based on the short-term load forecasting method of " cluster-recurrence " model |
CN109191953A (en) * | 2018-11-12 | 2019-01-11 | 重庆靶向科技发展有限公司 | A kind of intelligentized system of teaching and learning and method |
CN111275239A (en) * | 2019-12-20 | 2020-06-12 | 西安电子科技大学 | Multi-mode-based networked teaching data analysis method and system |
Non-Patent Citations (2)
Title |
---|
刘金晶;王丽英;: "在线学习社区发帖质量评价的回归模型研究", 南京师范大学学报(工程技术版), no. 01 * |
李亚婷;陈敏;王欢;周驰;王会军;: "融合网络学习空间过程性数据的中小学教师信息素养评估研究", 中国电化教育, no. 09 * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116167669A (en) * | 2023-04-26 | 2023-05-26 | 国网浙江省电力有限公司金华供电公司 | Carbon emission assessment method based on power consumption regression |
CN116167669B (en) * | 2023-04-26 | 2023-07-21 | 国网浙江省电力有限公司金华供电公司 | Carbon emission assessment method based on power consumption regression |
Also Published As
Publication number | Publication date |
---|---|
CN112699933B (en) | 2023-07-07 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107230174B (en) | Online interactive learning system and method based on network | |
Matzavela et al. | Decision tree learning through a predictive model for student academic performance in intelligent m-learning environments | |
Bonsaksen | Predictors of academic performance and education programme satisfaction in occupational therapy students | |
CN113656687B (en) | Teacher portrait construction method based on teaching and research data | |
Zhong et al. | Design of a personalized recommendation system for learning resources based on collaborative filtering | |
Hernández Torrano et al. | A bibliometric analysis of publications in the web of science category of educational psychology in the last two decades | |
Gil-Izquierdo et al. | Guidelines for data fusion with international large scale assessments: Insights from the TALIS-PISA link database | |
José-García et al. | C3-IoC: A career guidance system for assessing student skills using machine learning and network visualisation | |
Li | A study on the influence of non-intelligence factors on college students’ English learning achievement based on C4. 5 algorithm of decision tree | |
CN117033603A (en) | Construction method, device, equipment and storage medium of large model in vertical field | |
Frans et al. | Early identification of children at risk for academic difficulties using standardized assessment: stability and predictive validity of preschool math and language scores | |
Oreski et al. | CRISP-DM process model in educational setting | |
CN108763459B (en) | Professional tendency analysis method and system based on psychological test and DNN algorithm | |
CN112699933B (en) | Automatic identification method and system for processing capability of user teaching materials | |
Kim et al. | Summarizing Students’ Free Responses for an Introductory Algebra-Based Physics Course Survey Using Cluster and Sentiment Analysis | |
Heys | Machine learning as a tool to identify critical assignments | |
CN117271776A (en) | Intelligent multi-label labeling method and system for difficulty, knowledge point and solution idea | |
CN115600834A (en) | Middle and primary school teacher digital literacy evaluation method and system based on procedural data | |
CN115796692A (en) | Method and system for constructing digital literacy portrait of primary and middle school teacher integrating dynamic and static data | |
Balabied et al. | Utilizing random forest algorithm for early detection of academic underperformance in open learning environments | |
Luo et al. | A Personalized MOOC Learning Group and Course Recommendation Method Based on Graph Neural Network and Social Network Analysis | |
Su | [Retracted] Smart Teaching Design Mode based on Machine Learning and its Effect Evaluation | |
Rahayuningsih | Polimarin Alumni's English Proficiency in Maritime Industry Competitiveness in Indonesia | |
Ngo et al. | Exploration and integration of job portals in Vietnam | |
Van der Merwe et al. | Mapping the field of statistics education research in search of scholarship |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |