CN114429281B - Online learner activity degree evaluation method based on deep clustering algorithm - Google Patents
Online learner activity degree evaluation method based on deep clustering algorithm Download PDFInfo
- Publication number
- CN114429281B CN114429281B CN202111649117.9A CN202111649117A CN114429281B CN 114429281 B CN114429281 B CN 114429281B CN 202111649117 A CN202111649117 A CN 202111649117A CN 114429281 B CN114429281 B CN 114429281B
- Authority
- CN
- China
- Prior art keywords
- learner
- course
- online
- activity
- liveness
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 230000000694 effects Effects 0.000 title claims abstract description 86
- 238000004422 calculation algorithm Methods 0.000 title claims abstract description 38
- 238000011156 evaluation Methods 0.000 title claims abstract description 27
- 230000006399 behavior Effects 0.000 claims abstract description 61
- 238000000034 method Methods 0.000 claims abstract description 22
- 238000009826 distribution Methods 0.000 claims abstract description 16
- 238000005516 engineering process Methods 0.000 claims abstract description 11
- 238000007794 visualization technique Methods 0.000 claims abstract description 6
- 238000004891 communication Methods 0.000 claims description 16
- 238000004458 analytical method Methods 0.000 claims description 13
- 238000005065 mining Methods 0.000 claims description 11
- 239000011159 matrix material Substances 0.000 claims description 10
- 238000012549 training Methods 0.000 claims description 10
- 230000008439 repair process Effects 0.000 claims description 9
- 238000007619 statistical method Methods 0.000 claims description 8
- 230000006870 function Effects 0.000 claims description 7
- 238000004140 cleaning Methods 0.000 claims description 5
- 230000002159 abnormal effect Effects 0.000 claims description 4
- 238000010606 normalization Methods 0.000 claims description 4
- 238000012216 screening Methods 0.000 claims description 4
- 239000006185 dispersion Substances 0.000 claims description 3
- 238000005457 optimization Methods 0.000 claims description 3
- 238000011425 standardization method Methods 0.000 claims description 2
- 230000009193 crawling Effects 0.000 claims 2
- 238000012800 visualization Methods 0.000 claims 1
- 230000003993 interaction Effects 0.000 abstract description 3
- 238000007405 data analysis Methods 0.000 abstract description 2
- 238000004364 calculation method Methods 0.000 description 11
- 230000003542 behavioural effect Effects 0.000 description 4
- 230000006872 improvement Effects 0.000 description 4
- 238000010586 diagram Methods 0.000 description 3
- 230000004913 activation Effects 0.000 description 2
- 238000013016 damping Methods 0.000 description 2
- 230000009467 reduction Effects 0.000 description 2
- ORILYTVJVMAKLC-UHFFFAOYSA-N Adamantane Natural products C1C(C2)CC3CC1CC2C3 ORILYTVJVMAKLC-UHFFFAOYSA-N 0.000 description 1
- 230000006978 adaptation Effects 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000007621 cluster analysis Methods 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 230000004927 fusion Effects 0.000 description 1
- 230000000977 initiatory effect Effects 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000008569 process Effects 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000012552 review Methods 0.000 description 1
- 238000000638 solvent extraction Methods 0.000 description 1
- 230000001131 transforming effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/06—Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
- G06Q10/063—Operations research, analysis or management
- G06Q10/0639—Performance analysis of employees; Performance analysis of enterprise or organisation operations
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/21—Design, administration or maintenance of databases
- G06F16/215—Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/242—Query formulation
- G06F16/2433—Query languages
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2455—Query execution
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/904—Browsing; Visualisation therefor
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/951—Indexing; Web crawling techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/10—Services
- G06Q50/20—Education
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Business, Economics & Management (AREA)
- General Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Human Resources & Organizations (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- General Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Mathematical Physics (AREA)
- Evolutionary Computation (AREA)
- Tourism & Hospitality (AREA)
- Strategic Management (AREA)
- Economics (AREA)
- Educational Administration (AREA)
- Quality & Reliability (AREA)
- Marketing (AREA)
- Software Systems (AREA)
- Development Economics (AREA)
- General Business, Economics & Management (AREA)
- Entrepreneurship & Innovation (AREA)
- Computing Systems (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Molecular Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Primary Health Care (AREA)
- Bioinformatics & Computational Biology (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Educational Technology (AREA)
- Game Theory and Decision Science (AREA)
- Operations Research (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention relates to the field of data analysis, and provides an online learner activity evaluation method based on a deep clustering algorithm, which adopts database operation, code embedding and web crawler technology to obtain the attribute, online behavior and multi-source information of online resources of a learner from an online platform; according to a full-view learning theory, the activity of the online learner is comprehensively evaluated from the interaction of the learner with the online platform, the learning content and other learners, the activity distribution of the learners with various demographics, regional attributions and platform registration attributes is presented through a visualization method, and a new thought and method are provided for the evaluation of the application and service level of the online platform.
Description
Technical Field
The invention relates to the field of data analysis, in particular to an online learner activity evaluation method based on a deep clustering algorithm.
Background
With the deep fusion development of information technology and education, online education can meet the personalized and customized requirements of learners, and a normalized education and teaching mode is formed. The online learning platform is a main carrier for developing online education, and retains a large amount of real and detailed learner operation behavior data. Mining the sequence data can find out the potential association behind the learner behavior and the matching rule related to the learning activity. By exploring the activity of learners with different backgrounds on the platform, the behavior habits and rules of the learners can be effectively insights, the online learning experience of the learners is optimized, the online learning investment and the resource application level are enhanced, and the learning quality is further improved. For an online learning platform, by analyzing the activeness of a learner, the behavior pattern of the learner and the implicit evaluation on resources and services can be effectively known, and direction guidance is provided for the platform improvement and optimization of the learner services.
Current research on learner liveness is primarily measured and characterized by the number of visits a learner has made over a period. The calculation method is too simple and coarse, and although some highly active learners can be identified, some learners with low statistical data ranking but high activity quality are missed, such as learners who interact with other learners a lot and have regular activity although the login times are low. These users often have difficulty characterizing their importance with simple statistical data.
Disclosure of Invention
Aiming at the defects or improvement requirements of the prior art, the invention provides an online learner activity evaluation method based on a deep clustering algorithm, which comprehensively evaluates the activity of online learners from the interaction behaviors of learners with online platforms, learning contents and other learners according to a full-view learning theory and provides a new thought and method for the evaluation of online platform application and service levels.
The purpose of the invention is achieved by the following technical measures.
The invention provides an online learner activity evaluation method based on a deep clustering algorithm, which comprises the following steps:
s1, collecting multi-source online learning information. Collecting learner attribute information including demographics, territorial affiliation, and platform registration from an online platform management system using database operations; collecting online behavior information of learner platform login, curriculum learning, homework submission, forum communication and curriculum resource evaluation by adopting a code point burying technology; and acquiring online resource information of courses, jobs, subject posts and course resources from the online platform through a web crawler technology.
And S2, constructing the activity sequence characteristics. According to database query and description statistical analysis, cleaning learner attribute, online behavior and online resource information; mining high-dimensional online behavior sequence characteristics based on different time periods by adopting a description statistical method based on a time domain; and according to the data normalization rule, time sequence characteristics are linearly converted, and the unification of all data dimensions is realized.
And S3, performing liveness deep clustering modeling. Constructing a cascade self-encoder, and extracting low-dimensional key features representing implicit associations among the characteristics of the learner activity sequence; identifying the liveness classes by adopting a neighbor propagation algorithm and distributing learners to each liveness; and (4) training a self-encoder and a neighbor propagation algorithm jointly according to the activity initial clustering, and optimizing an activity clustering result.
And S4, analyzing the activity class characteristics. Calculating the support degree, the confidence degree and the promotion degree of the activity sequence characteristics and each activity by adopting an association rule algorithm, and finding out typical characteristics strongly associated with each activity; adopting a text analysis method to mine semantic features of the typical features of each liveness degree; and presenting the distribution of the learner liveness of each demographic, region attribution and platform registration attribute by adopting a visualization method.
The invention has the beneficial effects that:
obtaining the attribute, online behavior and multi-source information of online resources of a learner from an online platform by adopting methods and technologies of database operation, code embedding and web crawler; based on a full-view learning theory, from the point of interaction of a learner with an online platform, learning contents and other learners, constructing high-dimensional activity sequence characteristics of time length classes and frequency classes of learner platform login, curriculum learning, job submission, forum communication and curriculum resource evaluation, which have basically different time periods, curriculum types and curriculum resource types; extracting low-dimensional key features representing the correlation among the high-dimensional activity sequence features by adopting a self-encoder network, realizing learner activity initial clustering by adopting a neighbor propagation algorithm, training the self-encoder network and the neighbor propagation algorithm in a combined manner, and adjusting self-encoder network parameters to optimize a clustering result; finding out sequence features which are strongly associated with each activity by adopting an association rule algorithm, calculating strong association feature frequency and common frequency among features in each activity category according to text analysis, and mining semantic features of each activity; and presenting the activity distribution of the learners with the demographic, region attribution and platform registration attributes through a visualization method.
The deep clustering algorithm-based online learner activity evaluation method is convenient for online platform managers to evaluate the application and service level of the platform according to the learner activity, provides directions for platform improvement service optimization learner experience, and further improves learning efficiency and learning quality.
Drawings
FIG. 1 is a schematic diagram of the online learner activity level assessment method based on a deep clustering algorithm according to the present invention.
Fig. 2 is a structure diagram of multi-source online learning information content in the invention.
FIG. 3 is a schematic diagram of a deep clustering algorithm for online learner liveness assessment in accordance with the present invention.
Detailed Description
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the present invention will be further described in detail with reference to the accompanying drawings and the embodiments, and the embodiments described herein are only for explaining the present invention and are not intended to limit the present invention.
As shown in fig. 1, an embodiment of the present invention provides an online learner activity level evaluation method based on a depth clustering algorithm, including the following steps:
(1) And collecting multi-source online learning information.
As shown in FIG. 2, the information for assessing the liveness of an online learner in the present invention includes learner attributes, online behavior, and online resources. The specific manner and details of collecting various types of information are described below.
And (1-1) learner attribute information collection. Utilizing SQL sentences to extract the demographic information of learners including sex, age, grade and specialty, the region attribution information including codes of schools, counties, cities and provinces and the platform registration information including learner IDs and registration time from the database table of the online platform management system; the learner ID is an index that links learner individual attributes with learning behavior and online resource information.
And (1-2) acquiring online behavior information. Collecting online behavior information of learner platform login, course learning, job submission, forum communication and course resource evaluation by adopting a code point-burying technology according to an event model to form a real-time behavior operation log; the behavior operation log comprises a learner ID, an operation object, a behavior event type, operation time and an operation object ID; learner ID is an index that links online behavior with learner attribute information. Table 1 presents a sample of partial records of a behavioral oplog for a single learner.
TABLE 1 learner online behavior operation log example table
Learner ID | Operation object | Behavioral event types | Time of operation | Operation object ID |
stu000001 | Platform | Login to | T1 | 000000 |
stu000001 | Platform | Check-out | T2 | 000000 |
stu000001 | Course(s) | Study initiation | T3 | clp000001 |
stu000001 | Course(s) | End of learning | T4 | clp000001 |
stu000001 | Work in | Job submission | T7 | hw000001 |
stu000001 | Theme post | Hair poster | T8 | tp000001 |
stu000001 | Theme post | Reply card | T9 | tp000001 |
stu000001 | Curriculum resources | Review of course resources | T10 | res000001 |
stu000001 | Course resource | Course resource praise | T11 | res000001 |
stu000001 | Course resource | Course resource forwarding | T12 | res000001 |
stu000001 | Course resource | Course resource collection | T13 | res000001 |
(1-2-1) the types of the behavior events comprise login and logout of a platform, starting and ending of course learning, submission of jobs, posting and replying of forum communication, and comment, approval, forwarding and collection of course resource evaluation.
(1-2-2) the operation object comprises a platform, courses, homework, a theme post and course resources; the operand ID is the total set of the above operand IDs and is an index linking online behavior with online resource information.
And (1-3) collecting online resource information. And acquiring online resource information including courses, jobs, theme posts and course resources from the online platform by adopting a web crawler technology.
(1-3-1) course information including a course ID, a name, and a type; the types comprise general identification, selective repair and necessary repair; the course ID is an index that links the course with the job, the subject post, and the course resource information.
(1-3-2) job information including a job ID, a type, a submission time, and a course ID to which it belongs; types include documents, audio, and video; the course ID is an index for linking the operation and the course information, and the operation ID is an index for linking the operation and the online behavior information.
(1-3-3) the subject post information including the subject post ID, the publication time, and the belonging course ID; the course ID is an index for linking the subject post and the course information; the topic post ID is an index that links the topic post with online behavior information.
(1-3-4) the lesson resource information including a lesson resource ID, a type and a lesson ID field to which the lesson resource information belongs; types include documents, audio, and video; the course ID is an index for linking course resources and course information, and the course resource ID is an index for linking the course resources and the online behavior information.
(2) And constructing the activity sequence characteristics.
And (3) forming high-dimensional activity sequence characteristics which can be used for deep clustering on the basis of the multi-source online learning related information collected in the step (1) through the construction processes of data cleaning, characteristic mining and characteristic reduction.
And (2-1) multi-source information cleaning. According to the index field, adopting database query and description statistical analysis, accurately comparing the attribute of the learner with the online resource information, filtering abnormal data and filling missing values; setting the first record of various active behavior operations as an effective record according to the time sequence of log records, calculating the operation time interval between each record and the previous similar effective record one by one, setting a threshold value, and deleting the abnormal records of platform login, course learning, job submission, forum communication and course resource evaluation behaviors, of which the time interval is smaller than the threshold value.
Taking the example of the platform login as an example,
wherein,for the indication variable of whether the nth login behavior is valid, the value of 1 indicates that the record is valid and needs to be reserved, and the value of 0 indicates that the record is invalid and needs to be deleted completely;the value of n is from 2 at the nth login time point;the time point of the previous effective login;the threshold value of platform login, course learning and job submission is set to be 30min for the set threshold value, and the threshold value of course resource evaluation is 10min, which can also be adjusted according to the actual situation.
And (2-2) excavating sequence features. According to the index fields, active behaviors of platform login, course learning, job submission, forum communication and course resource evaluation and online resource information are combined in a classified link mode, a time domain-based description statistical analysis is adopted, a real-time behavior operation log is converted, and frequency class and duration class sequence characteristics of various online behaviors of learners are mined based on multiple time periods of days, weeks and months and multiple course types of necessary repair, selective repair and general knowledge.
(2-2-1) a platform login feature. Calculating the time length of single login according to the same login and logout operation time interval of the learner on the platform; and respectively calculating the login times and the total time of the learner in each period according to the login time record of the learner platform and the corresponding single-login time length.
(2-2-2) course learning feature. Calculating the learning time length of a single course according to the time interval between the beginning and the ending of the same course learning of the learner; according to the learner's course learning starting record and the single learning duration, the times and the total duration of the learner's learning general knowledge, required course and selected course in each period are respectively calculated.
(2-2-3) Job submission feature. And respectively calculating the times of submitting the document, audio and video jobs of the learner in each type of course in each period according to the record of submitting the jobs of the learner.
(2-2-4) Forum communication feature. And respectively calculating the posting and replying times of the learner in each type of course in each period according to the posting and replying behavior records of the learner based on the course learning content communicated in the forum.
(2-2-5) course resource evaluation feature. According to the behavior record of the learner for evaluating the course resources, the times of commenting, praise, forwarding and collecting the document, audio and video course resources by the learner in each period are respectively calculated.
Table 2 presents a specific calculation method and formula for calculating the frequency and duration sequence characteristics of each type of online behavior of learners in each period.
TABLE 2 exemplary table of activity sequence characteristic calculation method in period
(2-3) reduction of sequence characteristics. Linearly transforming the frequency class and time series characteristics of various online behaviors by adopting a minimum-maximum dispersion standardization method, and unifying data dimensions; and combining various online behavior time sequence characteristics in each period based on the learner ID to form high-dimensional activity sequence characteristics.
The min-max dispersion normalization method is formulated as:
wherein x is the original value of the feature, x' is the normalized feature value, x min ,x max Respectively the minimum and maximum values of the original features.
(3) And (5) carrying out activity deep clustering modeling.
As shown in fig. 3, based on the high-dimensional activity sequence features integrated in step (2-3), a cascade self-encoder is constructed to extract key information, the learner activity is initially clustered through a neighbor propagation algorithm, and the learner activity clustering result is optimized through joint training of the cascade self-encoder and the clustering algorithm.
And (3-1) extracting the activity key information. And constructing a multi-layer stack self-encoder, and extracting low-dimensional key features implicitly associated with high-dimensional activity sequence features based on learner platform login, course learning, job submission, forum communication and course resource evaluation by adopting a layer-by-layer greedy training method.
In the multilayer stack self-encoder, relu is adopted as an activation function, an Adam method is adopted as an optimizer, a plurality of self-encoders are cascaded, iterative fine tuning is performed step by step to minimize reconstruction errors, and low-dimensional features { H } of complex association information between high-dimensional sequence features of an online learner are extracted layer by layer 1 ,H k ,H C In which H (k) The key characteristics extracted for the kth hidden layer are that the value of k is 1,2, \8230, and C are the number of the cascade self-coders. The method of the cascade of the plurality of self-encoders is as follows.
Taking the high-dimensional activity sequence characteristics obtained in (2-3) as the original input X = R from the encoder d×n By a weight matrix W 1 Is offset byThe neural network parameters construct linear mapping, and each input node is coded by a nonlinear activation function g (-) to obtain a first hidden layer output H of the coder 1 :
The decoding method of (1) obtains a reconstructed output of the first stage self-encoder input samples The same dimension as X.
According to the principle of minimizing the mean square error (loss function) of the original input and the reconstructed output of the decoding layer, a gradient descent algorithm is adopted, and a back propagation error is adopted to adjust a network weight matrix W 1 And coding layer bias itemsAnd decoding layer bias termsWherein the loss function formula of the iterative training is
x i Is the original value of the low-dimensional characteristic of the association information between the ith learner activity sequence characteristic,for the reconstructed output of the decoder, N represents the number of learners input, and θ is comprised ofThe network parameters of (a) are set,denotes the square of the L2 norm, λ is a hyperparametric and λ>0。
Outputting H with the first hidden layer after the first self-encoder finishes running 1 For input, in the same step, pass through the weight matrix W 2 And bias termCoding to obtain the output H of the second hidden layer coding layer 2 And is based on W 2T Anddecoding reconstruction H 1 Output of (2)
A plurality of self-coders can be cascaded in the same method to gradually extract low-dimensional key features which are implicitly associated and represent the features of the learner's high-dimensional activity sequence.
And (3-2) initially clustering the activity. Calculating a learner similarity matrix according to an Euclidean distance formula by adopting a neighbor propagation algorithm, and iteratively updating the attraction degree and the attribution degree between learners; and identifying typical learners according to whether the sum of the attraction degree and the attribution degree of the learners meets a certain rule, forming activity class division, and distributing the rest learners to each activity class.
(3-2-1) similarity matrix calculation, namely calculating the similarity of the learner by adopting an Euclidean distance formula according to the low-dimensional key characteristics of the learner activity degree extracted from the encoder in the step (3-1);
the calculation formula of the similarity s (i, j) of learner i and learner j is as follows:
wherein K is the low-dimensional key feature dimension extracted in (3-1), x ik Value of k low dimensional key feature for ith learner, x jk And taking values for the kth low-dimensional key feature of the jth learner.
(3-2-2) screening clustering centers. According to the similarity matrix, iteratively calculating the attraction degree representing the fitting degree of the learner as a clustering center and the attribution degree of the fitting degree of the learner to a certain clustering center; and screening typical learners which can be used as clustering centers according to the rule that the sum of the self attribution degree and the attraction degree of the same learner is more than 0 to form the activeness clustering centers.
The update calculation formulas of the attraction degree r (i, j) and the attribution degree a (i, j) are as follows:
r(i,j)=s(i,j)-max{a(i,k)+s(i,k)}k∈1,2,…,N and k≠j
in the initial state, each learner is a potential clustering center, and the median of the similarity matrix is uniformly set to be s (i, i). In order to accelerate the convergence rate of the algorithm in the iterative updating of r (i, j) and a (i, j), a parameter damping factor lambda is introduced to carry out weighted updating on the attribution degree and the attraction degree, the value range of the parameter damping factor lambda is 0-1, and the updating rule is as follows:
r n =(1-λ)×r n +λ×r n-1
a n =(1-λ)×a n +λ×a n-1
after updating all the attribution degrees and the attraction degrees of the learners, if the attribution degree of the learner i and the attraction degree of the learner i are more than 0, selecting the learner as a clustering center, wherein the rule formula is as follows:
r(i,i)+a(i,i)>0
(3-2-3) initial cluster partitioning. And calculating the sum of the attraction and the attribution between other learners and each typical learner, and distributing other learners to each activity clustering center according to the principle of maximum summation to realize the initial clustering of the learner activity.
And (3-3) optimizing the activity clustering. And constructing auxiliary target distribution according to the initial clustering result, adopting a KL divergence function, training a stack self-encoder and a neighbor propagation clustering algorithm in a combined manner, iteratively adjusting the weight of a hidden layer of the self-encoder and the parameters of a bias item, improving the adaptation degree of a key feature structure extracted by a depth model and clustering analysis, and optimizing the liveness clustering result.
Through the last layer of self-encoder, the learner cluster and the mass centers of all clusters in the initial cluster analysis, the probability distribution Q of the learner cluster can be calculated, and the calculation formula is as follows:
q ij representing the probability that the i-liveness of the learner belongs to the j-th class, z i Means the key feature of learner i, μ, obtained in (3-1) j Referring to the features of a typical learner with liveness class j, α represents the degree of freedom of the student's t distribution.
The auxiliary target probability distribution P can be obtained through the power operation and normalization processing of the probability distribution of the learner cluster, and the calculation formula is as follows:
the difference between these two distributions for all learners is the KL divergence loss of the joint training componentLoss function L CLU The formula is as follows:
(4) And analyzing the activity class characteristics.
Based on the learner activity clustering result optimized in the step (3), a strong association characteristic of each activity is found out by adopting an association rule algorithm, semantic characteristics of each activity are mined based on a text analysis method for the strong association characteristic, and meanwhile, activity distribution of online learners with different attributes is presented by adopting a visualization method.
And (4-1) carrying out liveness association feature mining. Discretizing the active behavior sequence characteristics of the learner constructed in the step (2) into category variables containing 5 levels of { very low, medium, high and very high } according to percentiles, converting the category variables into a format suitable for association rule analysis, and forming a characteristic set X ap . Calculating the support degree, the confidence degree and the promotion degree of the activity sequence characteristics and each activity combination item set by adopting an Apriori association rule algorithm, and finding out typical characteristics having strong association with each activity;
suppose thatIs a subset of the learner's active behavior sequence feature set, such as { course learning duration-very high in one week, course learning frequency-high in one week, forum communication frequency-low in one week }; y is c C belongs to 1,2, \ 8230for a certain category of learner activity, and k are the number of categories of learner activity. The association rule analysis mainly comprises two steps.
First, find out Y c A frequent set of related behavioral characteristics. Liveness class Y c And specific behavior sequence feature item setFrequency of simultaneous occurrence, called item setThe support degree of (c) is recorded as:
wherein N is the total number of learners in the analysis sample. When the support degree exceeds a set threshold value, the item set is calledFor frequent item sets, the threshold value is generally set to 0.5, and can be adjusted according to actual conditions.
Second, generate and Y in frequent item set c Associated rules. Calculating the activity class as Y c Has in the learnerProbability of a feature, also calledThe calculation formula of (c) is as follows:
when the confidence is greater than the set threshold, it can be considered thatIs a reliable rule, i.e.Is with Y c A collection of closely related features. The confidence threshold is generally set to 0.5, which is adjustable according to actual conditions.
After the confidence coefficient is calculated, the method further needs to be screened according to the lifting degree index c Have closely related sequence characteristics. The meaning of the degree of lifting is to haveIn a behavioral characteristic learner, the liveness class is Y c With activity class other than Y c If the value is greater than 1, it indicates that there is a learner ratio ofThe learner's liveness class of the characteristics is Y c The probability of (a) being very close, i.e. having a very close relationship. The calculation formula of the lifting degree is as follows:
and (4-2) carrying out liveness association feature semantic analysis. And calculating strong association characteristic frequency of each activity and co-occurrence frequency among the characteristics by adopting a text analysis method, and mining semantic characteristics of each activity by visualizing through word cloud and co-occurrence network modes.
The correlation characteristic frequency is calculated by adopting a TF-IDF method, and the calculation formula is as follows:
is at Y c Strongly associated features X in liveness category learners i The number of people of (N) c Is a liveness category of Y c The total number of learners;
tfidf i c=tf i c *idf i c
tfidf i c is a strongly associated feature X i At Y c Liveness category the frequency of occurrence in a learner.
And (4-3) visualizing the activity distribution. And presenting the distribution characteristics of the activeness of learners in the demographic, territorial affiliation and platform registration attributes by adopting a visualization method of a line graph, a thermal map, a bubble map and a calendar map.
Those matters not described in detail in this specification are well within the knowledge of those skilled in the art.
Those skilled in the art will readily appreciate that the foregoing is only a preferred embodiment of this invention and is not intended to limit the invention to the details shown. Any modification, equivalent replacement or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.
Claims (9)
1. An online learner activity evaluation method based on a deep clustering algorithm is characterized by comprising the following steps:
s1, collecting multi-source online learning information, namely collecting learner attribute information including demographics, region attribution and platform registration from an online platform management system by using database operation; adopting a code point-burying technology to collect online behavior information of learner platform login, course learning, job submission, forum communication and course resource evaluation; acquiring online resource information of courses, jobs, subject posts and course resources from an online platform through a web crawler technology;
s2, constructing activeness sequence characteristics, inquiring and describing statistical analysis according to a database, and cleaning learner attributes, online behaviors and online resource information; mining high-dimensional online behavior sequence characteristics based on different time periods by adopting a description statistical method based on a time domain; linearly converting time sequence characteristics according to a data normalization rule to realize unification of all data dimensions;
s3, performing liveness deep clustering modeling, constructing a cascade autoencoder, and extracting low-dimensional key features representing implicit association among the characteristics of the liveness sequence of the learner; identifying the liveness classes by adopting a neighbor propagation algorithm and distributing learners to each liveness; training a self-encoder and a neighbor propagation algorithm in a combined manner according to the initial clustering of each activity, and optimizing the activity clustering result;
s4, analyzing the liveness category characteristics, calculating the support degree, the confidence degree and the promotion degree of the liveness sequence characteristics and each liveness by adopting an association rule algorithm, and finding out typical characteristics strongly associated with each liveness; adopting a text analysis method to mine semantic features of the typical features of each liveness degree; and presenting the distribution of the learner liveness of each demographic, region attribution and platform registration attribute by adopting a visualization method.
2. The method for assessing the activeness of an online learner based on a deep clustering algorithm according to claim 1, wherein the step S1 of collecting the multi-source online learning information specifically comprises:
s1.1, extracting learner attribute information, namely extracting learner demographic information including gender, age, grade and specialty, region attribution information including codes of schools, counties, cities and provinces to which the learner belongs, and platform registration information including learner ID and registration time from a database table of an online platform management system by using SQL sentences, wherein the learner ID is an index for linking the attributes of the learner with online behaviors and online resource information;
s1.2, collecting online behavior information of learner platform login, course learning, job submission, forum communication and course resource evaluation by adopting a code embedded point technology according to an event model to form a real-time behavior operation log; the behavior operation log comprises a learner ID, an operation object, a behavior event type, operation time and an operation object ID; learner ID is an index linking online behavior with learner attribute information;
s1.3, crawling online resource information, namely acquiring online resource information containing courses, jobs, subject posts and course resources from an online platform by adopting a web crawler technology.
3. The method for assessing the liveness of the online learner based on the deep clustering algorithm as claimed in claim 2, wherein the step S1.2 of collecting the online behavior information specifically comprises:
the behavior event types comprise login and logout of a platform, starting and ending of course learning, submission of work, posting and replying of forum communication, and comment, approval, forwarding and collection of course resource evaluation;
the operation object comprises a platform, courses, jobs, a theme post and course resource attributes; the operand ID is the aggregate of platform, course, job, topic post and course resource attribute ID, and is an index linking online behavior and resource information.
4. The method for assessing liveness of online learners based on deep clustering algorithm as claimed in claim 2, wherein step S1.3 of crawling online resource information specifically comprises:
course information including course ID, name and type; the types comprise general identification, optional repair and necessary repair; course ID is an index linking course with job, topic post and course resource information;
the job information comprises a job ID, a type, a submission time and a course ID; types include documents, audio, and video; the course ID is an index for linking the operation and the course information, and the operation ID is an index for linking the operation and the online behavior information;
the subject post information comprises a subject post ID, publication time and a course ID; the course ID is an index for linking the subject post and the course information; the theme post ID is an index for linking the theme post and the online behavior information;
the course resource information comprises a course resource ID, a type and a course ID; types include documents, audio, and video; the course ID is an index for linking course resources and course information, and the course resource ID is an index for linking the course resources and the online behavior information.
5. The method for assessing the activeness of an online learner based on a deep clustering algorithm according to claim 1, wherein the step S2 of constructing the activeness sequence features specifically comprises:
s2.1, multi-source information cleaning, database query and description statistical analysis are adopted according to the index fields, learner attributes and online resource information are accurately compared, abnormal data are filtered, and missing values are filled; setting a first record of various active behavior operations as an effective record according to the time sequence of log records, calculating the operation time interval between each record and the previous similar effective record one by one, setting a threshold value, and deleting abnormal records of which the time interval is less than the threshold value;
s2.2, mining sequence characteristics, namely, classifying and linking online behaviors and resource information of platform login, course learning, job submission, forum communication and course resource evaluation according to index fields, converting real-time behavior operation logs by adopting time domain-based description statistical analysis, and mining frequency class and duration class sequence characteristics of various online behaviors of learners based on multiple time periods of days, weeks and months and multiple course types of necessary repair, optional repair and general knowledge;
s2.3, reducing the sequence characteristics, linearly converting the frequency type and time length type time sequence characteristics of various online behaviors by adopting a minimum-maximum dispersion standardization method, and unifying the data dimension; and combining various online behavior time sequence characteristics in each period based on the learner ID to form high-dimensional activity sequence characteristics.
6. The method for assessing the liveness of the online learner based on the deep clustering algorithm as claimed in claim 5, wherein the step S2.2 of mining the sequence features specifically comprises:
the platform login characteristic is that the time length of single login is calculated according to the same login and logout operation time interval of a learner on the platform; respectively calculating the login times and the total time of the learner in each period according to the learner platform login time record and the corresponding single login time;
the course learning characteristics are used for calculating the learning time length of a single course according to the time interval between the beginning and the ending of the same course learning of the learner; respectively calculating the times and the total time of learning general knowledge, required course repair and course selection in each period according to the learning starting record of the learner course and the single learning time;
the operation submitting characteristic is that the times of submitting the document, audio and video operations in each type of course in each period are respectively calculated according to the record of the learner submitting the operation;
forum communication characteristics, which are used for respectively calculating the posting and replying times of each type of course in each period according to the posting and replying behavior records of the learner in forum communication based on the course learning content;
and the course resource evaluation characteristics are used for respectively calculating the times of comment, praise, forwarding and collection of the learner on the document, audio and video course resources in each period according to the behavior record of the learner for evaluating the course resources.
7. The deep clustering algorithm-based activity assessment method for online learners according to claim 1, wherein the step S3 of deep clustering modeling of activity specifically comprises:
s3.1, extracting activity key information, constructing a multilayer stack self-encoder, and extracting low-dimensional key features implicitly associated with high-dimensional activity sequence features based on learner platform login, course learning, job submission, forum communication and course resource evaluation by adopting a layer-by-layer greedy training method;
s3.2, initially clustering the liveness, calculating a similarity matrix of the learners according to an Euclidean distance formula by adopting a neighbor propagation algorithm, and iteratively updating the attraction degree and the attribution degree between the learners; identifying typical learners according to whether the sum of the two meets a certain rule, forming an activity level category, and distributing the other learners to each activity level;
and S3.3, liveness clustering optimization, constructing auxiliary target distribution according to an initial clustering result, adopting a KL divergence function, combining a training stack self-encoder and a neighbor propagation clustering algorithm, iteratively adjusting the hidden layer weight and the bias item parameter of the self-encoder, improving the adaptability of a key feature structure extracted by a depth model and clustering analysis, and optimizing the liveness clustering result.
8. The method for assessing the liveness of an online learner based on the deep clustering algorithm as claimed in claim 7, wherein the step S3.2 of the initial clustering of the liveness specifically comprises:
s3.2.1, calculating a similarity matrix, and calculating the similarity of learners by adopting an Euclidean distance formula according to the low-dimensional key characteristics of the learner activity extracted from the encoder in the S3.1;
s3.2.2, screening cluster centers, and iteratively calculating the attraction degree representing the fitting degree of the learner as the cluster centers and the attribution degree of the fitting degree of the learner belonging to a certain cluster center according to the similarity matrix; screening typical learners capable of being used as clustering centers according to a rule that the sum of the self attribution degree and the attraction degree of the same learner is greater than 0 to form an activity degree clustering center;
and S3.2.3, performing initial clustering division, calculating the sum of the attraction degree and the attribution degree between other learners and each typical learner, and distributing the other learners to each activity clustering center according to the maximum principle to realize the initial clustering of the activity of the learners.
9. The method for assessing the activeness of an online learner according to claim 1, wherein the step S4 of analyzing the characteristics of the activeness category specifically comprises:
s4.1, mining the liveness association characteristics, and converting the learner liveness sequence characteristics constructed in the S2 by adopting a data discretization method to form various characteristic item sets; calculating the support degree, the confidence degree and the promotion degree of the activity sequence characteristics and each activity combination item set by adopting an Apriori association rule algorithm, and finding out typical characteristics having strong association with each activity;
s4.2, performing liveness correlation characteristic semantic analysis, calculating the occurrence frequency of strong correlation characteristics of each liveness and the co-occurrence frequency among the characteristics by adopting a text analysis method, and performing visualization in a word cloud and co-occurrence network mode to mine the semantic characteristics of each liveness;
and S4.3, visualizing the activity distribution rule, and presenting the distribution characteristics of the learner activity of each demographic, region attribution and platform registration attribute by adopting a line graph, a heat map, a bubble map and a calendar map.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111649117.9A CN114429281B (en) | 2021-12-30 | 2021-12-30 | Online learner activity degree evaluation method based on deep clustering algorithm |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111649117.9A CN114429281B (en) | 2021-12-30 | 2021-12-30 | Online learner activity degree evaluation method based on deep clustering algorithm |
Publications (2)
Publication Number | Publication Date |
---|---|
CN114429281A CN114429281A (en) | 2022-05-03 |
CN114429281B true CN114429281B (en) | 2022-11-15 |
Family
ID=81311268
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111649117.9A Active CN114429281B (en) | 2021-12-30 | 2021-12-30 | Online learner activity degree evaluation method based on deep clustering algorithm |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114429281B (en) |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114662995A (en) * | 2022-05-19 | 2022-06-24 | 山东经贸职业学院 | Online learning effect evaluation method and system based on artificial intelligence |
CN114707471B (en) * | 2022-06-06 | 2022-09-09 | 浙江大学 | Artificial intelligent courseware making method and device based on hyper-parameter evaluation graph algorithm |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR20190099156A (en) * | 2019-08-06 | 2019-08-26 | 엘지전자 주식회사 | Method and device for authenticating user using user's behavior pattern |
CN110782375A (en) * | 2019-09-05 | 2020-02-11 | 华南师范大学 | Online learning overall process dynamic analysis method and system based on data |
CN111460249A (en) * | 2020-02-24 | 2020-07-28 | 桂林电子科技大学 | Personalized learning resource recommendation method based on learner preference modeling |
WO2020182710A1 (en) * | 2019-03-12 | 2020-09-17 | F. Hoffmann-La Roche Ag | Multiple instance learner for prognostic tissue pattern identification |
CN112101039A (en) * | 2020-08-05 | 2020-12-18 | 华中师范大学 | Learning interest discovery method for online learning community |
CN112115357A (en) * | 2020-09-11 | 2020-12-22 | 华中师范大学 | Online course forum interaction mode identification method and system |
CN113077100A (en) * | 2021-04-16 | 2021-07-06 | 西安交通大学 | Online learning potential exit prediction method based on automatic coding machine |
-
2021
- 2021-12-30 CN CN202111649117.9A patent/CN114429281B/en active Active
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2020182710A1 (en) * | 2019-03-12 | 2020-09-17 | F. Hoffmann-La Roche Ag | Multiple instance learner for prognostic tissue pattern identification |
KR20190099156A (en) * | 2019-08-06 | 2019-08-26 | 엘지전자 주식회사 | Method and device for authenticating user using user's behavior pattern |
CN110782375A (en) * | 2019-09-05 | 2020-02-11 | 华南师范大学 | Online learning overall process dynamic analysis method and system based on data |
CN111460249A (en) * | 2020-02-24 | 2020-07-28 | 桂林电子科技大学 | Personalized learning resource recommendation method based on learner preference modeling |
CN112101039A (en) * | 2020-08-05 | 2020-12-18 | 华中师范大学 | Learning interest discovery method for online learning community |
CN112115357A (en) * | 2020-09-11 | 2020-12-22 | 华中师范大学 | Online course forum interaction mode identification method and system |
CN113077100A (en) * | 2021-04-16 | 2021-07-06 | 西安交通大学 | Online learning potential exit prediction method based on automatic coding machine |
Non-Patent Citations (2)
Title |
---|
基于聚类算法的学生学业表现分析预测模型;陈来等;《山西警察学院学报》;20200415(第02期);全文 * |
面向MOOC课堂反馈的学习行为分析研究;谷欣等;《华中师范大学学报(自然科学版)》;20180810(第04期);全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN114429281A (en) | 2022-05-03 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN105677873B (en) | Text Intelligence association cluster based on model of the domain knowledge collects processing method | |
Murray et al. | Exploring the personal and professional factors associated with student evaluations of tenure-track faculty | |
CN114429281B (en) | Online learner activity degree evaluation method based on deep clustering algorithm | |
John | The policy agendas project: a review | |
CN104318340B (en) | Information visualization methods and intelligent visible analysis system based on text resume information | |
CN111709575A (en) | Academic achievement prediction method based on C-LSTM | |
CN111639291A (en) | Content distribution method, content distribution device, electronic equipment and storage medium | |
CN111191099B (en) | User activity type identification method based on social media | |
Yates et al. | The Oxford handbook of digital technology and society | |
CN110543594B (en) | Knowledge base-based personalized evidence-based correction recommendation method for prisoners | |
Félix et al. | Moodle predicta: A data mining tool for student follow up | |
Li et al. | Weighted dynamic time warping for traffic flow clustering | |
Sheeba et al. | A fuzzy logic based on sentiment classification | |
Jabine et al. | Record linkages for statistical purposes: methodological issues | |
Gray | Empowered collective: Formulating a black feminist information community model through archival analysis | |
JP2012098921A (en) | User classification system | |
Lili | A Mobile Terminal‐Based College English Teaching Evaluation Method | |
Nuraisyah et al. | Institutional Capacity Development of Village Owned Enterprises In Sarjo Village, Sarjo District | |
CN113222471A (en) | Asset wind control method and device based on new media data | |
Fang et al. | Machine learning in facilities & asset management | |
Hu et al. | Research on smart education service platform based on big data | |
KR102607570B1 (en) | Interview platform system for providing edited interview data according to the permission of the data receiver | |
KR102671618B1 (en) | Method and system for providing user-customized interview feedback for educational purposes based on deep learning | |
Susanto et al. | Hate Speech Cases in Cyber Media News Coverage | |
Ding et al. | Mining of association rules between students’ behavior and academic achievements |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |