CN114429281B - Online learner activity degree evaluation method based on deep clustering algorithm - Google Patents

Online learner activity degree evaluation method based on deep clustering algorithm Download PDF

Info

Publication number
CN114429281B
CN114429281B CN202111649117.9A CN202111649117A CN114429281B CN 114429281 B CN114429281 B CN 114429281B CN 202111649117 A CN202111649117 A CN 202111649117A CN 114429281 B CN114429281 B CN 114429281B
Authority
CN
China
Prior art keywords
learner
course
online
activity
liveness
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202111649117.9A
Other languages
Chinese (zh)
Other versions
CN114429281A (en
Inventor
卢春
李淼云
吴砥
钟正
徐建
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Central China Normal University
Original Assignee
Central China Normal University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Central China Normal University filed Critical Central China Normal University
Priority to CN202111649117.9A priority Critical patent/CN114429281B/en
Publication of CN114429281A publication Critical patent/CN114429281A/en
Application granted granted Critical
Publication of CN114429281B publication Critical patent/CN114429281B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0639Performance analysis of employees; Performance analysis of enterprise or organisation operations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/242Query formulation
    • G06F16/2433Query languages
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/904Browsing; Visualisation therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/20Education

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Business, Economics & Management (AREA)
  • General Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Human Resources & Organizations (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Evolutionary Computation (AREA)
  • Tourism & Hospitality (AREA)
  • Strategic Management (AREA)
  • Economics (AREA)
  • Educational Administration (AREA)
  • Quality & Reliability (AREA)
  • Marketing (AREA)
  • Software Systems (AREA)
  • Development Economics (AREA)
  • General Business, Economics & Management (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Primary Health Care (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Educational Technology (AREA)
  • Game Theory and Decision Science (AREA)
  • Operations Research (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention relates to the field of data analysis, and provides an online learner activity evaluation method based on a deep clustering algorithm, which adopts database operation, code embedding and web crawler technology to obtain the attribute, online behavior and multi-source information of online resources of a learner from an online platform; according to a full-view learning theory, the activity of the online learner is comprehensively evaluated from the interaction of the learner with the online platform, the learning content and other learners, the activity distribution of the learners with various demographics, regional attributions and platform registration attributes is presented through a visualization method, and a new thought and method are provided for the evaluation of the application and service level of the online platform.

Description

Online learner activity degree evaluation method based on deep clustering algorithm
Technical Field
The invention relates to the field of data analysis, in particular to an online learner activity evaluation method based on a deep clustering algorithm.
Background
With the deep fusion development of information technology and education, online education can meet the personalized and customized requirements of learners, and a normalized education and teaching mode is formed. The online learning platform is a main carrier for developing online education, and retains a large amount of real and detailed learner operation behavior data. Mining the sequence data can find out the potential association behind the learner behavior and the matching rule related to the learning activity. By exploring the activity of learners with different backgrounds on the platform, the behavior habits and rules of the learners can be effectively insights, the online learning experience of the learners is optimized, the online learning investment and the resource application level are enhanced, and the learning quality is further improved. For an online learning platform, by analyzing the activeness of a learner, the behavior pattern of the learner and the implicit evaluation on resources and services can be effectively known, and direction guidance is provided for the platform improvement and optimization of the learner services.
Current research on learner liveness is primarily measured and characterized by the number of visits a learner has made over a period. The calculation method is too simple and coarse, and although some highly active learners can be identified, some learners with low statistical data ranking but high activity quality are missed, such as learners who interact with other learners a lot and have regular activity although the login times are low. These users often have difficulty characterizing their importance with simple statistical data.
Disclosure of Invention
Aiming at the defects or improvement requirements of the prior art, the invention provides an online learner activity evaluation method based on a deep clustering algorithm, which comprehensively evaluates the activity of online learners from the interaction behaviors of learners with online platforms, learning contents and other learners according to a full-view learning theory and provides a new thought and method for the evaluation of online platform application and service levels.
The purpose of the invention is achieved by the following technical measures.
The invention provides an online learner activity evaluation method based on a deep clustering algorithm, which comprises the following steps:
s1, collecting multi-source online learning information. Collecting learner attribute information including demographics, territorial affiliation, and platform registration from an online platform management system using database operations; collecting online behavior information of learner platform login, curriculum learning, homework submission, forum communication and curriculum resource evaluation by adopting a code point burying technology; and acquiring online resource information of courses, jobs, subject posts and course resources from the online platform through a web crawler technology.
And S2, constructing the activity sequence characteristics. According to database query and description statistical analysis, cleaning learner attribute, online behavior and online resource information; mining high-dimensional online behavior sequence characteristics based on different time periods by adopting a description statistical method based on a time domain; and according to the data normalization rule, time sequence characteristics are linearly converted, and the unification of all data dimensions is realized.
And S3, performing liveness deep clustering modeling. Constructing a cascade self-encoder, and extracting low-dimensional key features representing implicit associations among the characteristics of the learner activity sequence; identifying the liveness classes by adopting a neighbor propagation algorithm and distributing learners to each liveness; and (4) training a self-encoder and a neighbor propagation algorithm jointly according to the activity initial clustering, and optimizing an activity clustering result.
And S4, analyzing the activity class characteristics. Calculating the support degree, the confidence degree and the promotion degree of the activity sequence characteristics and each activity by adopting an association rule algorithm, and finding out typical characteristics strongly associated with each activity; adopting a text analysis method to mine semantic features of the typical features of each liveness degree; and presenting the distribution of the learner liveness of each demographic, region attribution and platform registration attribute by adopting a visualization method.
The invention has the beneficial effects that:
obtaining the attribute, online behavior and multi-source information of online resources of a learner from an online platform by adopting methods and technologies of database operation, code embedding and web crawler; based on a full-view learning theory, from the point of interaction of a learner with an online platform, learning contents and other learners, constructing high-dimensional activity sequence characteristics of time length classes and frequency classes of learner platform login, curriculum learning, job submission, forum communication and curriculum resource evaluation, which have basically different time periods, curriculum types and curriculum resource types; extracting low-dimensional key features representing the correlation among the high-dimensional activity sequence features by adopting a self-encoder network, realizing learner activity initial clustering by adopting a neighbor propagation algorithm, training the self-encoder network and the neighbor propagation algorithm in a combined manner, and adjusting self-encoder network parameters to optimize a clustering result; finding out sequence features which are strongly associated with each activity by adopting an association rule algorithm, calculating strong association feature frequency and common frequency among features in each activity category according to text analysis, and mining semantic features of each activity; and presenting the activity distribution of the learners with the demographic, region attribution and platform registration attributes through a visualization method.
The deep clustering algorithm-based online learner activity evaluation method is convenient for online platform managers to evaluate the application and service level of the platform according to the learner activity, provides directions for platform improvement service optimization learner experience, and further improves learning efficiency and learning quality.
Drawings
FIG. 1 is a schematic diagram of the online learner activity level assessment method based on a deep clustering algorithm according to the present invention.
Fig. 2 is a structure diagram of multi-source online learning information content in the invention.
FIG. 3 is a schematic diagram of a deep clustering algorithm for online learner liveness assessment in accordance with the present invention.
Detailed Description
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the present invention will be further described in detail with reference to the accompanying drawings and the embodiments, and the embodiments described herein are only for explaining the present invention and are not intended to limit the present invention.
As shown in fig. 1, an embodiment of the present invention provides an online learner activity level evaluation method based on a depth clustering algorithm, including the following steps:
(1) And collecting multi-source online learning information.
As shown in FIG. 2, the information for assessing the liveness of an online learner in the present invention includes learner attributes, online behavior, and online resources. The specific manner and details of collecting various types of information are described below.
And (1-1) learner attribute information collection. Utilizing SQL sentences to extract the demographic information of learners including sex, age, grade and specialty, the region attribution information including codes of schools, counties, cities and provinces and the platform registration information including learner IDs and registration time from the database table of the online platform management system; the learner ID is an index that links learner individual attributes with learning behavior and online resource information.
And (1-2) acquiring online behavior information. Collecting online behavior information of learner platform login, course learning, job submission, forum communication and course resource evaluation by adopting a code point-burying technology according to an event model to form a real-time behavior operation log; the behavior operation log comprises a learner ID, an operation object, a behavior event type, operation time and an operation object ID; learner ID is an index that links online behavior with learner attribute information. Table 1 presents a sample of partial records of a behavioral oplog for a single learner.
TABLE 1 learner online behavior operation log example table
Learner ID Operation object Behavioral event types Time of operation Operation object ID
stu000001 Platform Login to T1 000000
stu000001 Platform Check-out T2 000000
stu000001 Course(s) Study initiation T3 clp000001
stu000001 Course(s) End of learning T4 clp000001
stu000001 Work in Job submission T7 hw000001
stu000001 Theme post Hair poster T8 tp000001
stu000001 Theme post Reply card T9 tp000001
stu000001 Curriculum resources Review of course resources T10 res000001
stu000001 Course resource Course resource praise T11 res000001
stu000001 Course resource Course resource forwarding T12 res000001
stu000001 Course resource Course resource collection T13 res000001
(1-2-1) the types of the behavior events comprise login and logout of a platform, starting and ending of course learning, submission of jobs, posting and replying of forum communication, and comment, approval, forwarding and collection of course resource evaluation.
(1-2-2) the operation object comprises a platform, courses, homework, a theme post and course resources; the operand ID is the total set of the above operand IDs and is an index linking online behavior with online resource information.
And (1-3) collecting online resource information. And acquiring online resource information including courses, jobs, theme posts and course resources from the online platform by adopting a web crawler technology.
(1-3-1) course information including a course ID, a name, and a type; the types comprise general identification, selective repair and necessary repair; the course ID is an index that links the course with the job, the subject post, and the course resource information.
(1-3-2) job information including a job ID, a type, a submission time, and a course ID to which it belongs; types include documents, audio, and video; the course ID is an index for linking the operation and the course information, and the operation ID is an index for linking the operation and the online behavior information.
(1-3-3) the subject post information including the subject post ID, the publication time, and the belonging course ID; the course ID is an index for linking the subject post and the course information; the topic post ID is an index that links the topic post with online behavior information.
(1-3-4) the lesson resource information including a lesson resource ID, a type and a lesson ID field to which the lesson resource information belongs; types include documents, audio, and video; the course ID is an index for linking course resources and course information, and the course resource ID is an index for linking the course resources and the online behavior information.
(2) And constructing the activity sequence characteristics.
And (3) forming high-dimensional activity sequence characteristics which can be used for deep clustering on the basis of the multi-source online learning related information collected in the step (1) through the construction processes of data cleaning, characteristic mining and characteristic reduction.
And (2-1) multi-source information cleaning. According to the index field, adopting database query and description statistical analysis, accurately comparing the attribute of the learner with the online resource information, filtering abnormal data and filling missing values; setting the first record of various active behavior operations as an effective record according to the time sequence of log records, calculating the operation time interval between each record and the previous similar effective record one by one, setting a threshold value, and deleting the abnormal records of platform login, course learning, job submission, forum communication and course resource evaluation behaviors, of which the time interval is smaller than the threshold value.
Taking the example of the platform login as an example,
Figure GDA0003833268160000061
wherein,
Figure GDA0003833268160000062
for the indication variable of whether the nth login behavior is valid, the value of 1 indicates that the record is valid and needs to be reserved, and the value of 0 indicates that the record is invalid and needs to be deleted completely;
Figure GDA0003833268160000063
the value of n is from 2 at the nth login time point;
Figure GDA0003833268160000064
the time point of the previous effective login;
Figure GDA0003833268160000065
the threshold value of platform login, course learning and job submission is set to be 30min for the set threshold value, and the threshold value of course resource evaluation is 10min, which can also be adjusted according to the actual situation.
And (2-2) excavating sequence features. According to the index fields, active behaviors of platform login, course learning, job submission, forum communication and course resource evaluation and online resource information are combined in a classified link mode, a time domain-based description statistical analysis is adopted, a real-time behavior operation log is converted, and frequency class and duration class sequence characteristics of various online behaviors of learners are mined based on multiple time periods of days, weeks and months and multiple course types of necessary repair, selective repair and general knowledge.
(2-2-1) a platform login feature. Calculating the time length of single login according to the same login and logout operation time interval of the learner on the platform; and respectively calculating the login times and the total time of the learner in each period according to the login time record of the learner platform and the corresponding single-login time length.
(2-2-2) course learning feature. Calculating the learning time length of a single course according to the time interval between the beginning and the ending of the same course learning of the learner; according to the learner's course learning starting record and the single learning duration, the times and the total duration of the learner's learning general knowledge, required course and selected course in each period are respectively calculated.
(2-2-3) Job submission feature. And respectively calculating the times of submitting the document, audio and video jobs of the learner in each type of course in each period according to the record of submitting the jobs of the learner.
(2-2-4) Forum communication feature. And respectively calculating the posting and replying times of the learner in each type of course in each period according to the posting and replying behavior records of the learner based on the course learning content communicated in the forum.
(2-2-5) course resource evaluation feature. According to the behavior record of the learner for evaluating the course resources, the times of commenting, praise, forwarding and collecting the document, audio and video course resources by the learner in each period are respectively calculated.
Table 2 presents a specific calculation method and formula for calculating the frequency and duration sequence characteristics of each type of online behavior of learners in each period.
TABLE 2 exemplary table of activity sequence characteristic calculation method in period
Figure GDA0003833268160000081
Figure GDA0003833268160000091
Figure GDA0003833268160000101
(2-3) reduction of sequence characteristics. Linearly transforming the frequency class and time series characteristics of various online behaviors by adopting a minimum-maximum dispersion standardization method, and unifying data dimensions; and combining various online behavior time sequence characteristics in each period based on the learner ID to form high-dimensional activity sequence characteristics.
The min-max dispersion normalization method is formulated as:
Figure GDA0003833268160000111
wherein x is the original value of the feature, x' is the normalized feature value, x min ,x max Respectively the minimum and maximum values of the original features.
(3) And (5) carrying out activity deep clustering modeling.
As shown in fig. 3, based on the high-dimensional activity sequence features integrated in step (2-3), a cascade self-encoder is constructed to extract key information, the learner activity is initially clustered through a neighbor propagation algorithm, and the learner activity clustering result is optimized through joint training of the cascade self-encoder and the clustering algorithm.
And (3-1) extracting the activity key information. And constructing a multi-layer stack self-encoder, and extracting low-dimensional key features implicitly associated with high-dimensional activity sequence features based on learner platform login, course learning, job submission, forum communication and course resource evaluation by adopting a layer-by-layer greedy training method.
In the multilayer stack self-encoder, relu is adopted as an activation function, an Adam method is adopted as an optimizer, a plurality of self-encoders are cascaded, iterative fine tuning is performed step by step to minimize reconstruction errors, and low-dimensional features { H } of complex association information between high-dimensional sequence features of an online learner are extracted layer by layer 1 ,H k ,H C In which H (k) The key characteristics extracted for the kth hidden layer are that the value of k is 1,2, \8230, and C are the number of the cascade self-coders. The method of the cascade of the plurality of self-encoders is as follows.
Taking the high-dimensional activity sequence characteristics obtained in (2-3) as the original input X = R from the encoder d×n By a weight matrix W 1 Is offset by
Figure GDA0003833268160000121
The neural network parameters construct linear mapping, and each input node is coded by a nonlinear activation function g (-) to obtain a first hidden layer output H of the coder 1
Figure GDA0003833268160000122
Output H based on first hidden layer 1 Setting the decoding layer bias term to
Figure GDA0003833268160000123
By passing
Figure GDA0003833268160000124
The decoding method of (1) obtains a reconstructed output of the first stage self-encoder input samples
Figure GDA0003833268160000125
Figure GDA0003833268160000126
The same dimension as X.
According to the principle of minimizing the mean square error (loss function) of the original input and the reconstructed output of the decoding layer, a gradient descent algorithm is adopted, and a back propagation error is adopted to adjust a network weight matrix W 1 And coding layer bias items
Figure GDA0003833268160000127
And decoding layer bias terms
Figure GDA0003833268160000128
Wherein the loss function formula of the iterative training is
Figure GDA0003833268160000129
x i Is the original value of the low-dimensional characteristic of the association information between the ith learner activity sequence characteristic,
Figure GDA00038332681600001210
for the reconstructed output of the decoder, N represents the number of learners input, and θ is comprised of
Figure GDA00038332681600001211
The network parameters of (a) are set,
Figure GDA00038332681600001212
denotes the square of the L2 norm, λ is a hyperparametric and λ>0。
Outputting H with the first hidden layer after the first self-encoder finishes running 1 For input, in the same step, pass through the weight matrix W 2 And bias term
Figure GDA00038332681600001213
Coding to obtain the output H of the second hidden layer coding layer 2 And is based on W 2T And
Figure GDA00038332681600001214
decoding reconstruction H 1 Output of (2)
Figure GDA00038332681600001215
Figure GDA00038332681600001216
A plurality of self-coders can be cascaded in the same method to gradually extract low-dimensional key features which are implicitly associated and represent the features of the learner's high-dimensional activity sequence.
And (3-2) initially clustering the activity. Calculating a learner similarity matrix according to an Euclidean distance formula by adopting a neighbor propagation algorithm, and iteratively updating the attraction degree and the attribution degree between learners; and identifying typical learners according to whether the sum of the attraction degree and the attribution degree of the learners meets a certain rule, forming activity class division, and distributing the rest learners to each activity class.
(3-2-1) similarity matrix calculation, namely calculating the similarity of the learner by adopting an Euclidean distance formula according to the low-dimensional key characteristics of the learner activity degree extracted from the encoder in the step (3-1);
the calculation formula of the similarity s (i, j) of learner i and learner j is as follows:
Figure GDA0003833268160000131
wherein K is the low-dimensional key feature dimension extracted in (3-1), x ik Value of k low dimensional key feature for ith learner, x jk And taking values for the kth low-dimensional key feature of the jth learner.
(3-2-2) screening clustering centers. According to the similarity matrix, iteratively calculating the attraction degree representing the fitting degree of the learner as a clustering center and the attribution degree of the fitting degree of the learner to a certain clustering center; and screening typical learners which can be used as clustering centers according to the rule that the sum of the self attribution degree and the attraction degree of the same learner is more than 0 to form the activeness clustering centers.
The update calculation formulas of the attraction degree r (i, j) and the attribution degree a (i, j) are as follows:
r(i,j)=s(i,j)-max{a(i,k)+s(i,k)}k∈1,2,…,N and k≠j
Figure GDA0003833268160000132
in the initial state, each learner is a potential clustering center, and the median of the similarity matrix is uniformly set to be s (i, i). In order to accelerate the convergence rate of the algorithm in the iterative updating of r (i, j) and a (i, j), a parameter damping factor lambda is introduced to carry out weighted updating on the attribution degree and the attraction degree, the value range of the parameter damping factor lambda is 0-1, and the updating rule is as follows:
r n =(1-λ)×r n +λ×r n-1
a n =(1-λ)×a n +λ×a n-1
after updating all the attribution degrees and the attraction degrees of the learners, if the attribution degree of the learner i and the attraction degree of the learner i are more than 0, selecting the learner as a clustering center, wherein the rule formula is as follows:
r(i,i)+a(i,i)>0
(3-2-3) initial cluster partitioning. And calculating the sum of the attraction and the attribution between other learners and each typical learner, and distributing other learners to each activity clustering center according to the principle of maximum summation to realize the initial clustering of the learner activity.
And (3-3) optimizing the activity clustering. And constructing auxiliary target distribution according to the initial clustering result, adopting a KL divergence function, training a stack self-encoder and a neighbor propagation clustering algorithm in a combined manner, iteratively adjusting the weight of a hidden layer of the self-encoder and the parameters of a bias item, improving the adaptation degree of a key feature structure extracted by a depth model and clustering analysis, and optimizing the liveness clustering result.
Through the last layer of self-encoder, the learner cluster and the mass centers of all clusters in the initial cluster analysis, the probability distribution Q of the learner cluster can be calculated, and the calculation formula is as follows:
Figure GDA0003833268160000141
q ij representing the probability that the i-liveness of the learner belongs to the j-th class, z i Means the key feature of learner i, μ, obtained in (3-1) j Referring to the features of a typical learner with liveness class j, α represents the degree of freedom of the student's t distribution.
The auxiliary target probability distribution P can be obtained through the power operation and normalization processing of the probability distribution of the learner cluster, and the calculation formula is as follows:
Figure GDA0003833268160000151
the difference between these two distributions for all learners is the KL divergence loss of the joint training componentLoss function L CLU The formula is as follows:
Figure GDA0003833268160000152
(4) And analyzing the activity class characteristics.
Based on the learner activity clustering result optimized in the step (3), a strong association characteristic of each activity is found out by adopting an association rule algorithm, semantic characteristics of each activity are mined based on a text analysis method for the strong association characteristic, and meanwhile, activity distribution of online learners with different attributes is presented by adopting a visualization method.
And (4-1) carrying out liveness association feature mining. Discretizing the active behavior sequence characteristics of the learner constructed in the step (2) into category variables containing 5 levels of { very low, medium, high and very high } according to percentiles, converting the category variables into a format suitable for association rule analysis, and forming a characteristic set X ap . Calculating the support degree, the confidence degree and the promotion degree of the activity sequence characteristics and each activity combination item set by adopting an Apriori association rule algorithm, and finding out typical characteristics having strong association with each activity;
suppose that
Figure GDA0003833268160000153
Is a subset of the learner's active behavior sequence feature set, such as { course learning duration-very high in one week, course learning frequency-high in one week, forum communication frequency-low in one week }; y is c C belongs to 1,2, \ 8230for a certain category of learner activity, and k are the number of categories of learner activity. The association rule analysis mainly comprises two steps.
First, find out Y c A frequent set of related behavioral characteristics. Liveness class Y c And specific behavior sequence feature item set
Figure GDA0003833268160000161
Frequency of simultaneous occurrence, called item set
Figure GDA0003833268160000162
The support degree of (c) is recorded as:
Figure GDA0003833268160000163
wherein N is the total number of learners in the analysis sample. When the support degree exceeds a set threshold value, the item set is called
Figure GDA0003833268160000164
For frequent item sets, the threshold value is generally set to 0.5, and can be adjusted according to actual conditions.
Second, generate and Y in frequent item set c Associated rules. Calculating the activity class as Y c Has in the learner
Figure GDA0003833268160000165
Probability of a feature, also called
Figure GDA0003833268160000166
The calculation formula of (c) is as follows:
Figure GDA0003833268160000167
when the confidence is greater than the set threshold, it can be considered that
Figure GDA0003833268160000168
Is a reliable rule, i.e.
Figure GDA0003833268160000169
Is with Y c A collection of closely related features. The confidence threshold is generally set to 0.5, which is adjustable according to actual conditions.
After the confidence coefficient is calculated, the method further needs to be screened according to the lifting degree index c Have closely related sequence characteristics. The meaning of the degree of lifting is to have
Figure GDA00038332681600001610
In a behavioral characteristic learner, the liveness class is Y c With activity class other than Y c If the value is greater than 1, it indicates that there is a learner ratio of
Figure GDA00038332681600001611
The learner's liveness class of the characteristics is Y c The probability of (a) being very close, i.e. having a very close relationship. The calculation formula of the lifting degree is as follows:
Figure GDA00038332681600001612
and (4-2) carrying out liveness association feature semantic analysis. And calculating strong association characteristic frequency of each activity and co-occurrence frequency among the characteristics by adopting a text analysis method, and mining semantic characteristics of each activity by visualizing through word cloud and co-occurrence network modes.
The correlation characteristic frequency is calculated by adopting a TF-IDF method, and the calculation formula is as follows:
Figure GDA0003833268160000171
Figure GDA0003833268160000172
is at Y c Strongly associated features X in liveness category learners i The number of people of (N) c Is a liveness category of Y c The total number of learners;
Figure GDA0003833268160000173
tfidf i c=tf i c *idf i c
tfidf i c is a strongly associated feature X i At Y c Liveness category the frequency of occurrence in a learner.
And (4-3) visualizing the activity distribution. And presenting the distribution characteristics of the activeness of learners in the demographic, territorial affiliation and platform registration attributes by adopting a visualization method of a line graph, a thermal map, a bubble map and a calendar map.
Those matters not described in detail in this specification are well within the knowledge of those skilled in the art.
Those skilled in the art will readily appreciate that the foregoing is only a preferred embodiment of this invention and is not intended to limit the invention to the details shown. Any modification, equivalent replacement or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (9)

1. An online learner activity evaluation method based on a deep clustering algorithm is characterized by comprising the following steps:
s1, collecting multi-source online learning information, namely collecting learner attribute information including demographics, region attribution and platform registration from an online platform management system by using database operation; adopting a code point-burying technology to collect online behavior information of learner platform login, course learning, job submission, forum communication and course resource evaluation; acquiring online resource information of courses, jobs, subject posts and course resources from an online platform through a web crawler technology;
s2, constructing activeness sequence characteristics, inquiring and describing statistical analysis according to a database, and cleaning learner attributes, online behaviors and online resource information; mining high-dimensional online behavior sequence characteristics based on different time periods by adopting a description statistical method based on a time domain; linearly converting time sequence characteristics according to a data normalization rule to realize unification of all data dimensions;
s3, performing liveness deep clustering modeling, constructing a cascade autoencoder, and extracting low-dimensional key features representing implicit association among the characteristics of the liveness sequence of the learner; identifying the liveness classes by adopting a neighbor propagation algorithm and distributing learners to each liveness; training a self-encoder and a neighbor propagation algorithm in a combined manner according to the initial clustering of each activity, and optimizing the activity clustering result;
s4, analyzing the liveness category characteristics, calculating the support degree, the confidence degree and the promotion degree of the liveness sequence characteristics and each liveness by adopting an association rule algorithm, and finding out typical characteristics strongly associated with each liveness; adopting a text analysis method to mine semantic features of the typical features of each liveness degree; and presenting the distribution of the learner liveness of each demographic, region attribution and platform registration attribute by adopting a visualization method.
2. The method for assessing the activeness of an online learner based on a deep clustering algorithm according to claim 1, wherein the step S1 of collecting the multi-source online learning information specifically comprises:
s1.1, extracting learner attribute information, namely extracting learner demographic information including gender, age, grade and specialty, region attribution information including codes of schools, counties, cities and provinces to which the learner belongs, and platform registration information including learner ID and registration time from a database table of an online platform management system by using SQL sentences, wherein the learner ID is an index for linking the attributes of the learner with online behaviors and online resource information;
s1.2, collecting online behavior information of learner platform login, course learning, job submission, forum communication and course resource evaluation by adopting a code embedded point technology according to an event model to form a real-time behavior operation log; the behavior operation log comprises a learner ID, an operation object, a behavior event type, operation time and an operation object ID; learner ID is an index linking online behavior with learner attribute information;
s1.3, crawling online resource information, namely acquiring online resource information containing courses, jobs, subject posts and course resources from an online platform by adopting a web crawler technology.
3. The method for assessing the liveness of the online learner based on the deep clustering algorithm as claimed in claim 2, wherein the step S1.2 of collecting the online behavior information specifically comprises:
the behavior event types comprise login and logout of a platform, starting and ending of course learning, submission of work, posting and replying of forum communication, and comment, approval, forwarding and collection of course resource evaluation;
the operation object comprises a platform, courses, jobs, a theme post and course resource attributes; the operand ID is the aggregate of platform, course, job, topic post and course resource attribute ID, and is an index linking online behavior and resource information.
4. The method for assessing liveness of online learners based on deep clustering algorithm as claimed in claim 2, wherein step S1.3 of crawling online resource information specifically comprises:
course information including course ID, name and type; the types comprise general identification, optional repair and necessary repair; course ID is an index linking course with job, topic post and course resource information;
the job information comprises a job ID, a type, a submission time and a course ID; types include documents, audio, and video; the course ID is an index for linking the operation and the course information, and the operation ID is an index for linking the operation and the online behavior information;
the subject post information comprises a subject post ID, publication time and a course ID; the course ID is an index for linking the subject post and the course information; the theme post ID is an index for linking the theme post and the online behavior information;
the course resource information comprises a course resource ID, a type and a course ID; types include documents, audio, and video; the course ID is an index for linking course resources and course information, and the course resource ID is an index for linking the course resources and the online behavior information.
5. The method for assessing the activeness of an online learner based on a deep clustering algorithm according to claim 1, wherein the step S2 of constructing the activeness sequence features specifically comprises:
s2.1, multi-source information cleaning, database query and description statistical analysis are adopted according to the index fields, learner attributes and online resource information are accurately compared, abnormal data are filtered, and missing values are filled; setting a first record of various active behavior operations as an effective record according to the time sequence of log records, calculating the operation time interval between each record and the previous similar effective record one by one, setting a threshold value, and deleting abnormal records of which the time interval is less than the threshold value;
s2.2, mining sequence characteristics, namely, classifying and linking online behaviors and resource information of platform login, course learning, job submission, forum communication and course resource evaluation according to index fields, converting real-time behavior operation logs by adopting time domain-based description statistical analysis, and mining frequency class and duration class sequence characteristics of various online behaviors of learners based on multiple time periods of days, weeks and months and multiple course types of necessary repair, optional repair and general knowledge;
s2.3, reducing the sequence characteristics, linearly converting the frequency type and time length type time sequence characteristics of various online behaviors by adopting a minimum-maximum dispersion standardization method, and unifying the data dimension; and combining various online behavior time sequence characteristics in each period based on the learner ID to form high-dimensional activity sequence characteristics.
6. The method for assessing the liveness of the online learner based on the deep clustering algorithm as claimed in claim 5, wherein the step S2.2 of mining the sequence features specifically comprises:
the platform login characteristic is that the time length of single login is calculated according to the same login and logout operation time interval of a learner on the platform; respectively calculating the login times and the total time of the learner in each period according to the learner platform login time record and the corresponding single login time;
the course learning characteristics are used for calculating the learning time length of a single course according to the time interval between the beginning and the ending of the same course learning of the learner; respectively calculating the times and the total time of learning general knowledge, required course repair and course selection in each period according to the learning starting record of the learner course and the single learning time;
the operation submitting characteristic is that the times of submitting the document, audio and video operations in each type of course in each period are respectively calculated according to the record of the learner submitting the operation;
forum communication characteristics, which are used for respectively calculating the posting and replying times of each type of course in each period according to the posting and replying behavior records of the learner in forum communication based on the course learning content;
and the course resource evaluation characteristics are used for respectively calculating the times of comment, praise, forwarding and collection of the learner on the document, audio and video course resources in each period according to the behavior record of the learner for evaluating the course resources.
7. The deep clustering algorithm-based activity assessment method for online learners according to claim 1, wherein the step S3 of deep clustering modeling of activity specifically comprises:
s3.1, extracting activity key information, constructing a multilayer stack self-encoder, and extracting low-dimensional key features implicitly associated with high-dimensional activity sequence features based on learner platform login, course learning, job submission, forum communication and course resource evaluation by adopting a layer-by-layer greedy training method;
s3.2, initially clustering the liveness, calculating a similarity matrix of the learners according to an Euclidean distance formula by adopting a neighbor propagation algorithm, and iteratively updating the attraction degree and the attribution degree between the learners; identifying typical learners according to whether the sum of the two meets a certain rule, forming an activity level category, and distributing the other learners to each activity level;
and S3.3, liveness clustering optimization, constructing auxiliary target distribution according to an initial clustering result, adopting a KL divergence function, combining a training stack self-encoder and a neighbor propagation clustering algorithm, iteratively adjusting the hidden layer weight and the bias item parameter of the self-encoder, improving the adaptability of a key feature structure extracted by a depth model and clustering analysis, and optimizing the liveness clustering result.
8. The method for assessing the liveness of an online learner based on the deep clustering algorithm as claimed in claim 7, wherein the step S3.2 of the initial clustering of the liveness specifically comprises:
s3.2.1, calculating a similarity matrix, and calculating the similarity of learners by adopting an Euclidean distance formula according to the low-dimensional key characteristics of the learner activity extracted from the encoder in the S3.1;
s3.2.2, screening cluster centers, and iteratively calculating the attraction degree representing the fitting degree of the learner as the cluster centers and the attribution degree of the fitting degree of the learner belonging to a certain cluster center according to the similarity matrix; screening typical learners capable of being used as clustering centers according to a rule that the sum of the self attribution degree and the attraction degree of the same learner is greater than 0 to form an activity degree clustering center;
and S3.2.3, performing initial clustering division, calculating the sum of the attraction degree and the attribution degree between other learners and each typical learner, and distributing the other learners to each activity clustering center according to the maximum principle to realize the initial clustering of the activity of the learners.
9. The method for assessing the activeness of an online learner according to claim 1, wherein the step S4 of analyzing the characteristics of the activeness category specifically comprises:
s4.1, mining the liveness association characteristics, and converting the learner liveness sequence characteristics constructed in the S2 by adopting a data discretization method to form various characteristic item sets; calculating the support degree, the confidence degree and the promotion degree of the activity sequence characteristics and each activity combination item set by adopting an Apriori association rule algorithm, and finding out typical characteristics having strong association with each activity;
s4.2, performing liveness correlation characteristic semantic analysis, calculating the occurrence frequency of strong correlation characteristics of each liveness and the co-occurrence frequency among the characteristics by adopting a text analysis method, and performing visualization in a word cloud and co-occurrence network mode to mine the semantic characteristics of each liveness;
and S4.3, visualizing the activity distribution rule, and presenting the distribution characteristics of the learner activity of each demographic, region attribution and platform registration attribute by adopting a line graph, a heat map, a bubble map and a calendar map.
CN202111649117.9A 2021-12-30 2021-12-30 Online learner activity degree evaluation method based on deep clustering algorithm Active CN114429281B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111649117.9A CN114429281B (en) 2021-12-30 2021-12-30 Online learner activity degree evaluation method based on deep clustering algorithm

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111649117.9A CN114429281B (en) 2021-12-30 2021-12-30 Online learner activity degree evaluation method based on deep clustering algorithm

Publications (2)

Publication Number Publication Date
CN114429281A CN114429281A (en) 2022-05-03
CN114429281B true CN114429281B (en) 2022-11-15

Family

ID=81311268

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111649117.9A Active CN114429281B (en) 2021-12-30 2021-12-30 Online learner activity degree evaluation method based on deep clustering algorithm

Country Status (1)

Country Link
CN (1) CN114429281B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114662995A (en) * 2022-05-19 2022-06-24 山东经贸职业学院 Online learning effect evaluation method and system based on artificial intelligence
CN114707471B (en) * 2022-06-06 2022-09-09 浙江大学 Artificial intelligent courseware making method and device based on hyper-parameter evaluation graph algorithm

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20190099156A (en) * 2019-08-06 2019-08-26 엘지전자 주식회사 Method and device for authenticating user using user's behavior pattern
CN110782375A (en) * 2019-09-05 2020-02-11 华南师范大学 Online learning overall process dynamic analysis method and system based on data
CN111460249A (en) * 2020-02-24 2020-07-28 桂林电子科技大学 Personalized learning resource recommendation method based on learner preference modeling
WO2020182710A1 (en) * 2019-03-12 2020-09-17 F. Hoffmann-La Roche Ag Multiple instance learner for prognostic tissue pattern identification
CN112101039A (en) * 2020-08-05 2020-12-18 华中师范大学 Learning interest discovery method for online learning community
CN112115357A (en) * 2020-09-11 2020-12-22 华中师范大学 Online course forum interaction mode identification method and system
CN113077100A (en) * 2021-04-16 2021-07-06 西安交通大学 Online learning potential exit prediction method based on automatic coding machine

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020182710A1 (en) * 2019-03-12 2020-09-17 F. Hoffmann-La Roche Ag Multiple instance learner for prognostic tissue pattern identification
KR20190099156A (en) * 2019-08-06 2019-08-26 엘지전자 주식회사 Method and device for authenticating user using user's behavior pattern
CN110782375A (en) * 2019-09-05 2020-02-11 华南师范大学 Online learning overall process dynamic analysis method and system based on data
CN111460249A (en) * 2020-02-24 2020-07-28 桂林电子科技大学 Personalized learning resource recommendation method based on learner preference modeling
CN112101039A (en) * 2020-08-05 2020-12-18 华中师范大学 Learning interest discovery method for online learning community
CN112115357A (en) * 2020-09-11 2020-12-22 华中师范大学 Online course forum interaction mode identification method and system
CN113077100A (en) * 2021-04-16 2021-07-06 西安交通大学 Online learning potential exit prediction method based on automatic coding machine

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
基于聚类算法的学生学业表现分析预测模型;陈来等;《山西警察学院学报》;20200415(第02期);全文 *
面向MOOC课堂反馈的学习行为分析研究;谷欣等;《华中师范大学学报(自然科学版)》;20180810(第04期);全文 *

Also Published As

Publication number Publication date
CN114429281A (en) 2022-05-03

Similar Documents

Publication Publication Date Title
CN105677873B (en) Text Intelligence association cluster based on model of the domain knowledge collects processing method
Murray et al. Exploring the personal and professional factors associated with student evaluations of tenure-track faculty
CN114429281B (en) Online learner activity degree evaluation method based on deep clustering algorithm
John The policy agendas project: a review
CN104318340B (en) Information visualization methods and intelligent visible analysis system based on text resume information
CN111709575A (en) Academic achievement prediction method based on C-LSTM
CN111639291A (en) Content distribution method, content distribution device, electronic equipment and storage medium
CN111191099B (en) User activity type identification method based on social media
Yates et al. The Oxford handbook of digital technology and society
CN110543594B (en) Knowledge base-based personalized evidence-based correction recommendation method for prisoners
Félix et al. Moodle predicta: A data mining tool for student follow up
Li et al. Weighted dynamic time warping for traffic flow clustering
Sheeba et al. A fuzzy logic based on sentiment classification
Jabine et al. Record linkages for statistical purposes: methodological issues
Gray Empowered collective: Formulating a black feminist information community model through archival analysis
JP2012098921A (en) User classification system
Lili A Mobile Terminal‐Based College English Teaching Evaluation Method
Nuraisyah et al. Institutional Capacity Development of Village Owned Enterprises In Sarjo Village, Sarjo District
CN113222471A (en) Asset wind control method and device based on new media data
Fang et al. Machine learning in facilities & asset management
Hu et al. Research on smart education service platform based on big data
KR102607570B1 (en) Interview platform system for providing edited interview data according to the permission of the data receiver
KR102671618B1 (en) Method and system for providing user-customized interview feedback for educational purposes based on deep learning
Susanto et al. Hate Speech Cases in Cyber Media News Coverage
Ding et al. Mining of association rules between students’ behavior and academic achievements

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant