CN114429281B

CN114429281B - Online learner activity degree evaluation method based on deep clustering algorithm

Info

Publication number: CN114429281B
Application number: CN202111649117.9A
Authority: CN
Inventors: 卢春; 李淼云; 吴砥; 钟正; 徐建
Original assignee: Central China Normal University
Current assignee: Central China Normal University
Priority date: 2021-12-30
Filing date: 2021-12-30
Publication date: 2022-11-15
Anticipated expiration: 2041-12-30
Also published as: CN114429281A

Abstract

The invention relates to the field of data analysis, and provides an online learner activity evaluation method based on a deep clustering algorithm, which adopts database operation, code embedding and web crawler technology to obtain the attribute, online behavior and multi-source information of online resources of a learner from an online platform; according to a full-view learning theory, the activity of the online learner is comprehensively evaluated from the interaction of the learner with the online platform, the learning content and other learners, the activity distribution of the learners with various demographics, regional attributions and platform registration attributes is presented through a visualization method, and a new thought and method are provided for the evaluation of the application and service level of the online platform.

Description

Online learner activity degree evaluation method based on deep clustering algorithm

Technical Field

The invention relates to the field of data analysis, in particular to an online learner activity evaluation method based on a deep clustering algorithm.

Background

With the deep fusion development of information technology and education, online education can meet the personalized and customized requirements of learners, and a normalized education and teaching mode is formed. The online learning platform is a main carrier for developing online education, and retains a large amount of real and detailed learner operation behavior data. Mining the sequence data can find out the potential association behind the learner behavior and the matching rule related to the learning activity. By exploring the activity of learners with different backgrounds on the platform, the behavior habits and rules of the learners can be effectively insights, the online learning experience of the learners is optimized, the online learning investment and the resource application level are enhanced, and the learning quality is further improved. For an online learning platform, by analyzing the activeness of a learner, the behavior pattern of the learner and the implicit evaluation on resources and services can be effectively known, and direction guidance is provided for the platform improvement and optimization of the learner services.

Current research on learner liveness is primarily measured and characterized by the number of visits a learner has made over a period. The calculation method is too simple and coarse, and although some highly active learners can be identified, some learners with low statistical data ranking but high activity quality are missed, such as learners who interact with other learners a lot and have regular activity although the login times are low. These users often have difficulty characterizing their importance with simple statistical data.

Disclosure of Invention

Aiming at the defects or improvement requirements of the prior art, the invention provides an online learner activity evaluation method based on a deep clustering algorithm, which comprehensively evaluates the activity of online learners from the interaction behaviors of learners with online platforms, learning contents and other learners according to a full-view learning theory and provides a new thought and method for the evaluation of online platform application and service levels.

The purpose of the invention is achieved by the following technical measures.

The invention provides an online learner activity evaluation method based on a deep clustering algorithm, which comprises the following steps:

s1, collecting multi-source online learning information. Collecting learner attribute information including demographics, territorial affiliation, and platform registration from an online platform management system using database operations; collecting online behavior information of learner platform login, curriculum learning, homework submission, forum communication and curriculum resource evaluation by adopting a code point burying technology; and acquiring online resource information of courses, jobs, subject posts and course resources from the online platform through a web crawler technology.

And S2, constructing the activity sequence characteristics. According to database query and description statistical analysis, cleaning learner attribute, online behavior and online resource information; mining high-dimensional online behavior sequence characteristics based on different time periods by adopting a description statistical method based on a time domain; and according to the data normalization rule, time sequence characteristics are linearly converted, and the unification of all data dimensions is realized.

And S3, performing liveness deep clustering modeling. Constructing a cascade self-encoder, and extracting low-dimensional key features representing implicit associations among the characteristics of the learner activity sequence; identifying the liveness classes by adopting a neighbor propagation algorithm and distributing learners to each liveness; and (4) training a self-encoder and a neighbor propagation algorithm jointly according to the activity initial clustering, and optimizing an activity clustering result.

And S4, analyzing the activity class characteristics. Calculating the support degree, the confidence degree and the promotion degree of the activity sequence characteristics and each activity by adopting an association rule algorithm, and finding out typical characteristics strongly associated with each activity; adopting a text analysis method to mine semantic features of the typical features of each liveness degree; and presenting the distribution of the learner liveness of each demographic, region attribution and platform registration attribute by adopting a visualization method.

The invention has the beneficial effects that:

obtaining the attribute, online behavior and multi-source information of online resources of a learner from an online platform by adopting methods and technologies of database operation, code embedding and web crawler; based on a full-view learning theory, from the point of interaction of a learner with an online platform, learning contents and other learners, constructing high-dimensional activity sequence characteristics of time length classes and frequency classes of learner platform login, curriculum learning, job submission, forum communication and curriculum resource evaluation, which have basically different time periods, curriculum types and curriculum resource types; extracting low-dimensional key features representing the correlation among the high-dimensional activity sequence features by adopting a self-encoder network, realizing learner activity initial clustering by adopting a neighbor propagation algorithm, training the self-encoder network and the neighbor propagation algorithm in a combined manner, and adjusting self-encoder network parameters to optimize a clustering result; finding out sequence features which are strongly associated with each activity by adopting an association rule algorithm, calculating strong association feature frequency and common frequency among features in each activity category according to text analysis, and mining semantic features of each activity; and presenting the activity distribution of the learners with the demographic, region attribution and platform registration attributes through a visualization method.

The deep clustering algorithm-based online learner activity evaluation method is convenient for online platform managers to evaluate the application and service level of the platform according to the learner activity, provides directions for platform improvement service optimization learner experience, and further improves learning efficiency and learning quality.

Drawings

FIG. 1 is a schematic diagram of the online learner activity level assessment method based on a deep clustering algorithm according to the present invention.

Fig. 2 is a structure diagram of multi-source online learning information content in the invention.

FIG. 3 is a schematic diagram of a deep clustering algorithm for online learner liveness assessment in accordance with the present invention.

Detailed Description

In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the present invention will be further described in detail with reference to the accompanying drawings and the embodiments, and the embodiments described herein are only for explaining the present invention and are not intended to limit the present invention.

As shown in fig. 1, an embodiment of the present invention provides an online learner activity level evaluation method based on a depth clustering algorithm, including the following steps:

(1) And collecting multi-source online learning information.

As shown in FIG. 2, the information for assessing the liveness of an online learner in the present invention includes learner attributes, online behavior, and online resources. The specific manner and details of collecting various types of information are described below.

And (1-1) learner attribute information collection. Utilizing SQL sentences to extract the demographic information of learners including sex, age, grade and specialty, the region attribution information including codes of schools, counties, cities and provinces and the platform registration information including learner IDs and registration time from the database table of the online platform management system; the learner ID is an index that links learner individual attributes with learning behavior and online resource information.

And (1-2) acquiring online behavior information. Collecting online behavior information of learner platform login, course learning, job submission, forum communication and course resource evaluation by adopting a code point-burying technology according to an event model to form a real-time behavior operation log; the behavior operation log comprises a learner ID, an operation object, a behavior event type, operation time and an operation object ID; learner ID is an index that links online behavior with learner attribute information. Table 1 presents a sample of partial records of a behavioral oplog for a single learner.

TABLE 1 learner online behavior operation log example table

Learner ID	Operation object	Behavioral event types	Time of operation	Operation object ID
					stu000001	Platform	Login to	T1	000000
stu000001	Platform	Check-out	T2	000000
					stu000001	Course(s)	Study initiation	T3	clp000001
stu000001	Course(s)	End of learning	T4	clp000001
					stu000001	Work in	Job submission	T7	hw000001
stu000001	Theme post	Hair poster	T8	tp000001
					stu000001	Theme post	Reply card	T9	tp000001
stu000001	Curriculum resources	Review of course resources	T10	res000001
					stu000001	Course resource	Course resource praise	T11	res000001
stu000001	Course resource	Course resource forwarding	T12	res000001
					stu000001	Course resource	Course resource collection	T13	res000001

(1-2-1) the types of the behavior events comprise login and logout of a platform, starting and ending of course learning, submission of jobs, posting and replying of forum communication, and comment, approval, forwarding and collection of course resource evaluation.

(1-2-2) the operation object comprises a platform, courses, homework, a theme post and course resources; the operand ID is the total set of the above operand IDs and is an index linking online behavior with online resource information.

And (1-3) collecting online resource information. And acquiring online resource information including courses, jobs, theme posts and course resources from the online platform by adopting a web crawler technology.

(1-3-1) course information including a course ID, a name, and a type; the types comprise general identification, selective repair and necessary repair; the course ID is an index that links the course with the job, the subject post, and the course resource information.

(1-3-2) job information including a job ID, a type, a submission time, and a course ID to which it belongs; types include documents, audio, and video; the course ID is an index for linking the operation and the course information, and the operation ID is an index for linking the operation and the online behavior information.

(1-3-3) the subject post information including the subject post ID, the publication time, and the belonging course ID; the course ID is an index for linking the subject post and the course information; the topic post ID is an index that links the topic post with online behavior information.

(1-3-4) the lesson resource information including a lesson resource ID, a type and a lesson ID field to which the lesson resource information belongs; types include documents, audio, and video; the course ID is an index for linking course resources and course information, and the course resource ID is an index for linking the course resources and the online behavior information.

(2) And constructing the activity sequence characteristics.

And (3) forming high-dimensional activity sequence characteristics which can be used for deep clustering on the basis of the multi-source online learning related information collected in the step (1) through the construction processes of data cleaning, characteristic mining and characteristic reduction.

And (2-1) multi-source information cleaning. According to the index field, adopting database query and description statistical analysis, accurately comparing the attribute of the learner with the online resource information, filtering abnormal data and filling missing values; setting the first record of various active behavior operations as an effective record according to the time sequence of log records, calculating the operation time interval between each record and the previous similar effective record one by one, setting a threshold value, and deleting the abnormal records of platform login, course learning, job submission, forum communication and course resource evaluation behaviors, of which the time interval is smaller than the threshold value.

Taking the example of the platform login as an example,

wherein,

for the indication variable of whether the nth login behavior is valid, the value of 1 indicates that the record is valid and needs to be reserved, and the value of 0 indicates that the record is invalid and needs to be deleted completely;

the value of n is from 2 at the nth login time point;

the time point of the previous effective login;

the threshold value of platform login, course learning and job submission is set to be 30min for the set threshold value, and the threshold value of course resource evaluation is 10min, which can also be adjusted according to the actual situation.

And (2-2) excavating sequence features. According to the index fields, active behaviors of platform login, course learning, job submission, forum communication and course resource evaluation and online resource information are combined in a classified link mode, a time domain-based description statistical analysis is adopted, a real-time behavior operation log is converted, and frequency class and duration class sequence characteristics of various online behaviors of learners are mined based on multiple time periods of days, weeks and months and multiple course types of necessary repair, selective repair and general knowledge.

(2-2-1) a platform login feature. Calculating the time length of single login according to the same login and logout operation time interval of the learner on the platform; and respectively calculating the login times and the total time of the learner in each period according to the login time record of the learner platform and the corresponding single-login time length.

(2-2-2) course learning feature. Calculating the learning time length of a single course according to the time interval between the beginning and the ending of the same course learning of the learner; according to the learner's course learning starting record and the single learning duration, the times and the total duration of the learner's learning general knowledge, required course and selected course in each period are respectively calculated.

(2-2-3) Job submission feature. And respectively calculating the times of submitting the document, audio and video jobs of the learner in each type of course in each period according to the record of submitting the jobs of the learner.

(2-2-4) Forum communication feature. And respectively calculating the posting and replying times of the learner in each type of course in each period according to the posting and replying behavior records of the learner based on the course learning content communicated in the forum.

(2-2-5) course resource evaluation feature. According to the behavior record of the learner for evaluating the course resources, the times of commenting, praise, forwarding and collecting the document, audio and video course resources by the learner in each period are respectively calculated.

Table 2 presents a specific calculation method and formula for calculating the frequency and duration sequence characteristics of each type of online behavior of learners in each period.

TABLE 2 exemplary table of activity sequence characteristic calculation method in period

(2-3) reduction of sequence characteristics. Linearly transforming the frequency class and time series characteristics of various online behaviors by adopting a minimum-maximum dispersion standardization method, and unifying data dimensions; and combining various online behavior time sequence characteristics in each period based on the learner ID to form high-dimensional activity sequence characteristics.

The min-max dispersion normalization method is formulated as:

wherein x is the original value of the feature, x' is the normalized feature value, x _min ，x _max Respectively the minimum and maximum values of the original features.

(3) And (5) carrying out activity deep clustering modeling.

As shown in fig. 3, based on the high-dimensional activity sequence features integrated in step (2-3), a cascade self-encoder is constructed to extract key information, the learner activity is initially clustered through a neighbor propagation algorithm, and the learner activity clustering result is optimized through joint training of the cascade self-encoder and the clustering algorithm.

And (3-1) extracting the activity key information. And constructing a multi-layer stack self-encoder, and extracting low-dimensional key features implicitly associated with high-dimensional activity sequence features based on learner platform login, course learning, job submission, forum communication and course resource evaluation by adopting a layer-by-layer greedy training method.

In the multilayer stack self-encoder, relu is adopted as an activation function, an Adam method is adopted as an optimizer, a plurality of self-encoders are cascaded, iterative fine tuning is performed step by step to minimize reconstruction errors, and low-dimensional features { H } of complex association information between high-dimensional sequence features of an online learner are extracted layer by layer ¹ ,H ^k ,H ^C In which H ^(k) The key characteristics extracted for the kth hidden layer are that the value of k is 1,2, \8230, and C are the number of the cascade self-coders. The method of the cascade of the plurality of self-encoders is as follows.

Taking the high-dimensional activity sequence characteristics obtained in (2-3) as the original input X = R from the encoder ^d×n By a weight matrix W ¹ Is offset by

The neural network parameters construct linear mapping, and each input node is coded by a nonlinear activation function g (-) to obtain a first hidden layer output H of the coder ¹ ：

Output H based on first hidden layer ¹ Setting the decoding layer bias term to

By passing

The decoding method of (1) obtains a reconstructed output of the first stage self-encoder input samples

The same dimension as X.

According to the principle of minimizing the mean square error (loss function) of the original input and the reconstructed output of the decoding layer, a gradient descent algorithm is adopted, and a back propagation error is adopted to adjust a network weight matrix W ¹ And coding layer bias items

And decoding layer bias terms

Wherein the loss function formula of the iterative training is

x _i Is the original value of the low-dimensional characteristic of the association information between the ith learner activity sequence characteristic,

for the reconstructed output of the decoder, N represents the number of learners input, and θ is comprised of

The network parameters of (a) are set,

denotes the square of the L2 norm, λ is a hyperparametric and λ>0。

Outputting H with the first hidden layer after the first self-encoder finishes running ¹ For input, in the same step, pass through the weight matrix W ² And bias term

Coding to obtain the output H of the second hidden layer coding layer ² And is based on W ^2T And

decoding reconstruction H ¹ Output of (2)

A plurality of self-coders can be cascaded in the same method to gradually extract low-dimensional key features which are implicitly associated and represent the features of the learner's high-dimensional activity sequence.

And (3-2) initially clustering the activity. Calculating a learner similarity matrix according to an Euclidean distance formula by adopting a neighbor propagation algorithm, and iteratively updating the attraction degree and the attribution degree between learners; and identifying typical learners according to whether the sum of the attraction degree and the attribution degree of the learners meets a certain rule, forming activity class division, and distributing the rest learners to each activity class.

(3-2-1) similarity matrix calculation, namely calculating the similarity of the learner by adopting an Euclidean distance formula according to the low-dimensional key characteristics of the learner activity degree extracted from the encoder in the step (3-1);

the calculation formula of the similarity s (i, j) of learner i and learner j is as follows:

wherein K is the low-dimensional key feature dimension extracted in (3-1), x _ik Value of k low dimensional key feature for ith learner, x _jk And taking values for the kth low-dimensional key feature of the jth learner.

(3-2-2) screening clustering centers. According to the similarity matrix, iteratively calculating the attraction degree representing the fitting degree of the learner as a clustering center and the attribution degree of the fitting degree of the learner to a certain clustering center; and screening typical learners which can be used as clustering centers according to the rule that the sum of the self attribution degree and the attraction degree of the same learner is more than 0 to form the activeness clustering centers.

The update calculation formulas of the attraction degree r (i, j) and the attribution degree a (i, j) are as follows:

r(i,j)＝s(i,j)-max{a(i,k)+s(i,k)}k∈1,2,…,N and k≠j

in the initial state, each learner is a potential clustering center, and the median of the similarity matrix is uniformly set to be s (i, i). In order to accelerate the convergence rate of the algorithm in the iterative updating of r (i, j) and a (i, j), a parameter damping factor lambda is introduced to carry out weighted updating on the attribution degree and the attraction degree, the value range of the parameter damping factor lambda is 0-1, and the updating rule is as follows:

r _n ＝(1-λ)×r _n +λ×r _n-1

a _n ＝(1-λ)×a _n +λ×a _n-1

after updating all the attribution degrees and the attraction degrees of the learners, if the attribution degree of the learner i and the attraction degree of the learner i are more than 0, selecting the learner as a clustering center, wherein the rule formula is as follows:

r(i,i)+a(i,i)>0

(3-2-3) initial cluster partitioning. And calculating the sum of the attraction and the attribution between other learners and each typical learner, and distributing other learners to each activity clustering center according to the principle of maximum summation to realize the initial clustering of the learner activity.

And (3-3) optimizing the activity clustering. And constructing auxiliary target distribution according to the initial clustering result, adopting a KL divergence function, training a stack self-encoder and a neighbor propagation clustering algorithm in a combined manner, iteratively adjusting the weight of a hidden layer of the self-encoder and the parameters of a bias item, improving the adaptation degree of a key feature structure extracted by a depth model and clustering analysis, and optimizing the liveness clustering result.

Through the last layer of self-encoder, the learner cluster and the mass centers of all clusters in the initial cluster analysis, the probability distribution Q of the learner cluster can be calculated, and the calculation formula is as follows:

q _ij representing the probability that the i-liveness of the learner belongs to the j-th class, z _i Means the key feature of learner i, μ, obtained in (3-1) _j Referring to the features of a typical learner with liveness class j, α represents the degree of freedom of the student's t distribution.

The auxiliary target probability distribution P can be obtained through the power operation and normalization processing of the probability distribution of the learner cluster, and the calculation formula is as follows:

the difference between these two distributions for all learners is the KL divergence loss of the joint training componentLoss function L _CLU The formula is as follows:

(4) And analyzing the activity class characteristics.

Based on the learner activity clustering result optimized in the step (3), a strong association characteristic of each activity is found out by adopting an association rule algorithm, semantic characteristics of each activity are mined based on a text analysis method for the strong association characteristic, and meanwhile, activity distribution of online learners with different attributes is presented by adopting a visualization method.

And (4-1) carrying out liveness association feature mining. Discretizing the active behavior sequence characteristics of the learner constructed in the step (2) into category variables containing 5 levels of { very low, medium, high and very high } according to percentiles, converting the category variables into a format suitable for association rule analysis, and forming a characteristic set X ^ap . Calculating the support degree, the confidence degree and the promotion degree of the activity sequence characteristics and each activity combination item set by adopting an Apriori association rule algorithm, and finding out typical characteristics having strong association with each activity;

suppose that

Is a subset of the learner's active behavior sequence feature set, such as { course learning duration-very high in one week, course learning frequency-high in one week, forum communication frequency-low in one week }; y is _c C belongs to 1,2, \ 8230for a certain category of learner activity, and k are the number of categories of learner activity. The association rule analysis mainly comprises two steps.

First, find out Y _c A frequent set of related behavioral characteristics. Liveness class Y _c And specific behavior sequence feature item set

Frequency of simultaneous occurrence, called item set

The support degree of (c) is recorded as:

wherein N is the total number of learners in the analysis sample. When the support degree exceeds a set threshold value, the item set is called

For frequent item sets, the threshold value is generally set to 0.5, and can be adjusted according to actual conditions.

Second, generate and Y in frequent item set _c Associated rules. Calculating the activity class as Y _c Has in the learner

Probability of a feature, also called

The calculation formula of (c) is as follows:

when the confidence is greater than the set threshold, it can be considered that

Is a reliable rule, i.e.

Is with Y _c A collection of closely related features. The confidence threshold is generally set to 0.5, which is adjustable according to actual conditions.

After the confidence coefficient is calculated, the method further needs to be screened according to the lifting degree index _c Have closely related sequence characteristics. The meaning of the degree of lifting is to have

In a behavioral characteristic learner, the liveness class is Y _c With activity class other than Y _c If the value is greater than 1, it indicates that there is a learner ratio of

The learner's liveness class of the characteristics is Y _c The probability of (a) being very close, i.e. having a very close relationship. The calculation formula of the lifting degree is as follows:

and (4-2) carrying out liveness association feature semantic analysis. And calculating strong association characteristic frequency of each activity and co-occurrence frequency among the characteristics by adopting a text analysis method, and mining semantic characteristics of each activity by visualizing through word cloud and co-occurrence network modes.

The correlation characteristic frequency is calculated by adopting a TF-IDF method, and the calculation formula is as follows:

is at Y _c Strongly associated features X in liveness category learners _i The number of people of (N) _c Is a liveness category of Y _c The total number of learners;

tfidf _i c＝tf _i ^c *idf _i ^c

tfidf _i ^c is a strongly associated feature X _i At Y _c Liveness category the frequency of occurrence in a learner.

And (4-3) visualizing the activity distribution. And presenting the distribution characteristics of the activeness of learners in the demographic, territorial affiliation and platform registration attributes by adopting a visualization method of a line graph, a thermal map, a bubble map and a calendar map.

Those matters not described in detail in this specification are well within the knowledge of those skilled in the art.

Those skilled in the art will readily appreciate that the foregoing is only a preferred embodiment of this invention and is not intended to limit the invention to the details shown. Any modification, equivalent replacement or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. An online learner activity evaluation method based on a deep clustering algorithm is characterized by comprising the following steps:

s1, collecting multi-source online learning information, namely collecting learner attribute information including demographics, region attribution and platform registration from an online platform management system by using database operation; adopting a code point-burying technology to collect online behavior information of learner platform login, course learning, job submission, forum communication and course resource evaluation; acquiring online resource information of courses, jobs, subject posts and course resources from an online platform through a web crawler technology;

s2, constructing activeness sequence characteristics, inquiring and describing statistical analysis according to a database, and cleaning learner attributes, online behaviors and online resource information; mining high-dimensional online behavior sequence characteristics based on different time periods by adopting a description statistical method based on a time domain; linearly converting time sequence characteristics according to a data normalization rule to realize unification of all data dimensions;

s3, performing liveness deep clustering modeling, constructing a cascade autoencoder, and extracting low-dimensional key features representing implicit association among the characteristics of the liveness sequence of the learner; identifying the liveness classes by adopting a neighbor propagation algorithm and distributing learners to each liveness; training a self-encoder and a neighbor propagation algorithm in a combined manner according to the initial clustering of each activity, and optimizing the activity clustering result;

s4, analyzing the liveness category characteristics, calculating the support degree, the confidence degree and the promotion degree of the liveness sequence characteristics and each liveness by adopting an association rule algorithm, and finding out typical characteristics strongly associated with each liveness; adopting a text analysis method to mine semantic features of the typical features of each liveness degree; and presenting the distribution of the learner liveness of each demographic, region attribution and platform registration attribute by adopting a visualization method.

2. The method for assessing the activeness of an online learner based on a deep clustering algorithm according to claim 1, wherein the step S1 of collecting the multi-source online learning information specifically comprises:

s1.1, extracting learner attribute information, namely extracting learner demographic information including gender, age, grade and specialty, region attribution information including codes of schools, counties, cities and provinces to which the learner belongs, and platform registration information including learner ID and registration time from a database table of an online platform management system by using SQL sentences, wherein the learner ID is an index for linking the attributes of the learner with online behaviors and online resource information;

s1.2, collecting online behavior information of learner platform login, course learning, job submission, forum communication and course resource evaluation by adopting a code embedded point technology according to an event model to form a real-time behavior operation log; the behavior operation log comprises a learner ID, an operation object, a behavior event type, operation time and an operation object ID; learner ID is an index linking online behavior with learner attribute information;

s1.3, crawling online resource information, namely acquiring online resource information containing courses, jobs, subject posts and course resources from an online platform by adopting a web crawler technology.

3. The method for assessing the liveness of the online learner based on the deep clustering algorithm as claimed in claim 2, wherein the step S1.2 of collecting the online behavior information specifically comprises:

the behavior event types comprise login and logout of a platform, starting and ending of course learning, submission of work, posting and replying of forum communication, and comment, approval, forwarding and collection of course resource evaluation;

the operation object comprises a platform, courses, jobs, a theme post and course resource attributes; the operand ID is the aggregate of platform, course, job, topic post and course resource attribute ID, and is an index linking online behavior and resource information.

4. The method for assessing liveness of online learners based on deep clustering algorithm as claimed in claim 2, wherein step S1.3 of crawling online resource information specifically comprises:

course information including course ID, name and type; the types comprise general identification, optional repair and necessary repair; course ID is an index linking course with job, topic post and course resource information;

the job information comprises a job ID, a type, a submission time and a course ID; types include documents, audio, and video; the course ID is an index for linking the operation and the course information, and the operation ID is an index for linking the operation and the online behavior information;

the subject post information comprises a subject post ID, publication time and a course ID; the course ID is an index for linking the subject post and the course information; the theme post ID is an index for linking the theme post and the online behavior information;

the course resource information comprises a course resource ID, a type and a course ID; types include documents, audio, and video; the course ID is an index for linking course resources and course information, and the course resource ID is an index for linking the course resources and the online behavior information.

5. The method for assessing the activeness of an online learner based on a deep clustering algorithm according to claim 1, wherein the step S2 of constructing the activeness sequence features specifically comprises:

s2.1, multi-source information cleaning, database query and description statistical analysis are adopted according to the index fields, learner attributes and online resource information are accurately compared, abnormal data are filtered, and missing values are filled; setting a first record of various active behavior operations as an effective record according to the time sequence of log records, calculating the operation time interval between each record and the previous similar effective record one by one, setting a threshold value, and deleting abnormal records of which the time interval is less than the threshold value;

s2.2, mining sequence characteristics, namely, classifying and linking online behaviors and resource information of platform login, course learning, job submission, forum communication and course resource evaluation according to index fields, converting real-time behavior operation logs by adopting time domain-based description statistical analysis, and mining frequency class and duration class sequence characteristics of various online behaviors of learners based on multiple time periods of days, weeks and months and multiple course types of necessary repair, optional repair and general knowledge;

s2.3, reducing the sequence characteristics, linearly converting the frequency type and time length type time sequence characteristics of various online behaviors by adopting a minimum-maximum dispersion standardization method, and unifying the data dimension; and combining various online behavior time sequence characteristics in each period based on the learner ID to form high-dimensional activity sequence characteristics.

6. The method for assessing the liveness of the online learner based on the deep clustering algorithm as claimed in claim 5, wherein the step S2.2 of mining the sequence features specifically comprises:

the platform login characteristic is that the time length of single login is calculated according to the same login and logout operation time interval of a learner on the platform; respectively calculating the login times and the total time of the learner in each period according to the learner platform login time record and the corresponding single login time;

the course learning characteristics are used for calculating the learning time length of a single course according to the time interval between the beginning and the ending of the same course learning of the learner; respectively calculating the times and the total time of learning general knowledge, required course repair and course selection in each period according to the learning starting record of the learner course and the single learning time;

the operation submitting characteristic is that the times of submitting the document, audio and video operations in each type of course in each period are respectively calculated according to the record of the learner submitting the operation;

forum communication characteristics, which are used for respectively calculating the posting and replying times of each type of course in each period according to the posting and replying behavior records of the learner in forum communication based on the course learning content;

and the course resource evaluation characteristics are used for respectively calculating the times of comment, praise, forwarding and collection of the learner on the document, audio and video course resources in each period according to the behavior record of the learner for evaluating the course resources.

7. The deep clustering algorithm-based activity assessment method for online learners according to claim 1, wherein the step S3 of deep clustering modeling of activity specifically comprises:

s3.1, extracting activity key information, constructing a multilayer stack self-encoder, and extracting low-dimensional key features implicitly associated with high-dimensional activity sequence features based on learner platform login, course learning, job submission, forum communication and course resource evaluation by adopting a layer-by-layer greedy training method;

s3.2, initially clustering the liveness, calculating a similarity matrix of the learners according to an Euclidean distance formula by adopting a neighbor propagation algorithm, and iteratively updating the attraction degree and the attribution degree between the learners; identifying typical learners according to whether the sum of the two meets a certain rule, forming an activity level category, and distributing the other learners to each activity level;

and S3.3, liveness clustering optimization, constructing auxiliary target distribution according to an initial clustering result, adopting a KL divergence function, combining a training stack self-encoder and a neighbor propagation clustering algorithm, iteratively adjusting the hidden layer weight and the bias item parameter of the self-encoder, improving the adaptability of a key feature structure extracted by a depth model and clustering analysis, and optimizing the liveness clustering result.

8. The method for assessing the liveness of an online learner based on the deep clustering algorithm as claimed in claim 7, wherein the step S3.2 of the initial clustering of the liveness specifically comprises:

s3.2.1, calculating a similarity matrix, and calculating the similarity of learners by adopting an Euclidean distance formula according to the low-dimensional key characteristics of the learner activity extracted from the encoder in the S3.1;

s3.2.2, screening cluster centers, and iteratively calculating the attraction degree representing the fitting degree of the learner as the cluster centers and the attribution degree of the fitting degree of the learner belonging to a certain cluster center according to the similarity matrix; screening typical learners capable of being used as clustering centers according to a rule that the sum of the self attribution degree and the attraction degree of the same learner is greater than 0 to form an activity degree clustering center;

and S3.2.3, performing initial clustering division, calculating the sum of the attraction degree and the attribution degree between other learners and each typical learner, and distributing the other learners to each activity clustering center according to the maximum principle to realize the initial clustering of the activity of the learners.

9. The method for assessing the activeness of an online learner according to claim 1, wherein the step S4 of analyzing the characteristics of the activeness category specifically comprises:

s4.1, mining the liveness association characteristics, and converting the learner liveness sequence characteristics constructed in the S2 by adopting a data discretization method to form various characteristic item sets; calculating the support degree, the confidence degree and the promotion degree of the activity sequence characteristics and each activity combination item set by adopting an Apriori association rule algorithm, and finding out typical characteristics having strong association with each activity;

s4.2, performing liveness correlation characteristic semantic analysis, calculating the occurrence frequency of strong correlation characteristics of each liveness and the co-occurrence frequency among the characteristics by adopting a text analysis method, and performing visualization in a word cloud and co-occurrence network mode to mine the semantic characteristics of each liveness;

and S4.3, visualizing the activity distribution rule, and presenting the distribution characteristics of the learner activity of each demographic, region attribution and platform registration attribute by adopting a line graph, a heat map, a bubble map and a calendar map.