CN112085099B

CN112085099B - Distributed student clustering integration method and system

Info

Publication number: CN112085099B
Application number: CN202010943424.7A
Authority: CN
Inventors: 谢涛; 张春炯; 龚朝花
Original assignee: Southwest University
Current assignee: Southwest University
Priority date: 2020-09-09
Filing date: 2020-09-09
Publication date: 2022-05-17
Anticipated expiration: 2040-09-09
Also published as: CN112085099A

Abstract

The invention discloses a distributed student clustering integration method and a distributed student clustering integration system, relates to the technical field of network education, and solves the problems that an existing online education platform cannot accurately recommend learning resources to users, and the system is easy to crash due to large data calculation amount, and the technical scheme is as follows: acquiring original behavior data of a user, analyzing the original behavior data through a naive Bayes model, and extracting behavior characteristics; the user behaviors are accurately clustered, the similarity between cluster objects is small, and the similarity in the clusters is large, so that technical support can be provided for an online education platform to accurately recommend learning resources to the user, recommendation errors are reduced, and the user experience is improved; distributed calculation is performed on all user behaviors, so that the operation load of the online platform is reduced, and the normal operation of the online platform is ensured; through multiple times of fusion and identification of the user behavior classified and aggregated data, the calculation amount of the user behavior data classified and aggregated is reduced, and the reaction speed of the online education system is improved.

Description

Distributed student clustering integration method and system

Technical Field

The invention relates to the technical field of network education, in particular to a distributed student clustering integration method and system.

Background

With the rapid development of internet technology, the large-scale popularization of intelligent terminal devices such as smart phones and tablet computers and the like, mobile network resources such as 4G and the like are not scarce any more, and digitization and mobile online learning become new ways for people to accept education. Online education, also known as distance education and online learning, refers to a method for content dissemination and fast learning through application information technology and internet technology. Compared with the traditional education mode, the network education has the advantages of dispersed learning time, unlimited learning places, strong content targeting, high online interaction efficiency, repeated learning and the like.

In recent years, online education is more and more favored by users, wherein most online education platforms recommend related learning resources directly according to user behaviors, analysis on the user behaviors is not needed, investment cost is low, however, the recommended learning resources change along with the change of the user behaviors, accurate recommendation cannot be performed on the users, and meanwhile, the emotional feeling of the users is easily felt; a small number of online education platforms classify user behaviors after analyzing the user behaviors, and accurately recommend the user behaviors according to classification results, so that the user requirements are met to a certain extent, however, the similarity between clusters is large, the similarity in the clusters is small in the conventional clustering integration method, and a large error still exists in accurate recommendation; meanwhile, as the number of the intelligent terminals is increased, the data calculation amount is large, system breakdown is easily caused, and the reaction speed is low.

Therefore, how to research and design a distributed student clustering integration method and system is a problem which is urgently needed to be solved at present, and technical support is provided for online education to realize accurate learning resource recommendation.

Disclosure of Invention

In order to solve the problems that the existing online education platform cannot accurately recommend learning resources to users, and the system is easy to crash due to large data calculation amount, the invention aims to provide a distributed student clustering integration method and system.

The technical purpose of the invention is realized by the following technical scheme:

in a first aspect, a distributed student clustering integration method is provided, which includes the following steps:

s1: acquiring original behavior data of a user, analyzing the original behavior data through a naive Bayes model, and extracting behavior characteristics;

s2: performing first classification and aggregation on the behavior characteristics of all users in the service area by using the same specific characteristics as standards through a lower server to form a plurality of cluster objects, wherein the same specific characteristics are selected by using a Pearson correlation coefficient as a standard; using the same specific characteristic as the cluster identification of the corresponding cluster object; simultaneously removing the same specific characteristics of each user in the cluster object to form a first cluster object;

s3: fusing all first cluster objects with the same cluster identifier in the subordinate server through the superior server to form a new first cluster object; performing secondary classification and aggregation by taking the same specific characteristics as standards to form a plurality of new cluster objects; adding the same specific characteristics into the first cluster identifications of the corresponding new cluster objects to form second cluster identifications; removing the same specific characteristics of each user in the new cluster object to form a second cluster object;

s4: repeating the step S3 by using the upper server in the step S3 as a lower server until the cluster object forms a cluster identifier;

s5: and carrying out statistical storage on the user information and the corresponding cluster identifications after the clustering is finished.

Preferably, in step S1, the original behavior data is subjected to non-associative data elimination and dimensionality reduction by a PCA principal component analysis method to obtain associated original behavior data.

Preferably, in step S3, the upper level server performs a second fusion on all second cluster objects with the same second cluster identifier to form a new second cluster object.

Preferably, when the behavior characteristics of the user simultaneously include the same specific characteristics of two or more cluster objects, the same specific characteristics of the cluster objects are weighted, and the same specific characteristics with larger weights are taken as priorities to be classified and aggregated.

Preferably, after the cluster object completes forming the cluster identifier, the cluster classification of the corresponding user is stopped.

In a second aspect, a distributed student clustering integration system is provided, which comprises user terminals, a database and at least two levels of servers, wherein a plurality of user terminals are arranged in a service area to which lower level servers belong, and a plurality of lower level servers are arranged in a service area to which upper level servers belong;

the user terminal is used for acquiring original behavior data of a user, analyzing the original behavior data through a naive Bayes model, and extracting behavior characteristics;

the lower server is used for carrying out first classification and aggregation on the behavior characteristics of all users in the service area to which the users belong by taking the same specific characteristics as standards to form a plurality of cluster objects, and the same specific characteristics are selected by taking a Pearson correlation coefficient as a standard; using the same specific characteristic as the cluster identification of the corresponding cluster object; simultaneously removing the same specific characteristics of each user in the cluster object to form a first cluster object;

the upper-level server is used for fusing all the first cluster objects with the same cluster identifier in the lower-level server to form a new first cluster object; performing secondary classification and aggregation by taking the same specific characteristics as standards to form a plurality of new cluster objects; adding the same specific characteristics into the first cluster identifications of the corresponding new cluster objects to form second cluster identifications; removing the same specific characteristics of each user in the new cluster object to form a second cluster object; repeatedly performing classification and aggregation by taking the superior server as the inferior server until the cluster object forms a cluster identifier;

and the database is used for counting and storing the user information completing the clustering and the corresponding clustering identification.

Preferably, the user terminal further includes a principal component analysis module, and the principal component analysis module obtains associated original behavior data after removing dimensionality reduction of non-associated data of the original behavior data.

Preferably, the upper level server is further configured to perform secondary fusion on all second cluster objects with the same second cluster identifier to form a new second cluster object.

Preferably, the server further comprises a weight calculation module; when the behavior characteristics of the user simultaneously contain the same specific characteristics of more than two cluster objects, the weight calculation module carries out weight value calculation on the same specific characteristics of the cluster objects, and the same specific characteristics with large weight values are taken as priorities to carry out classification and aggregation.

Preferably, the servers are all in communication connection with the database; and when the cluster object finishes forming the cluster identifier, stopping performing cluster classification on the corresponding user.

Compared with the prior art, the invention has the following beneficial effects:

(1) according to the method, the user behaviors are accurately clustered, the similarity between cluster objects is small, the similarity in the clusters is large, technical support can be provided for an online education platform to accurately recommend learning resources to the user, recommendation errors are reduced, and the user experience is improved;

(2) according to the invention, distributed calculation is carried out on all user behaviors, so that the operation load of the online platform is reduced, and the normal operation of the online platform is ensured;

(3) according to the invention, the user behavior classified and aggregated data is subjected to multiple times of fusion and identification, so that the calculation amount of the user behavior classified and aggregated data is reduced, and the reaction speed of the online education system is improved.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed for the embodiments or the prior art descriptions will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings without creative efforts.

FIG. 1 is a flowchart in example 1 of the present invention;

fig. 2 is an architecture diagram in embodiment 2 of the present invention.

Detailed Description

In order to make the technical problems, technical solutions and advantageous effects to be solved by the present invention more clearly understood, the present invention is further described in detail below with reference to fig. 1-2 and embodiments 1-2.

Furthermore, the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include one or more of that feature. In the description of the present invention, "a plurality" means two or more unless specifically defined otherwise.

Example 1: a distributed student clustering integration method, as shown in fig. 1, includes the following steps:

the method comprises the steps of firstly, obtaining original behavior data of a user, carrying out non-associated data elimination and dimensionality reduction on the original behavior data through a Principal Component Analysis (PCA) method to obtain associated original behavior data, analyzing the original behavior data through a naive Bayes model, and then extracting behavior characteristics. The original behavior data can be operation footprints, operation sequences, residence time, first attention points, click rates, login time, login places, user information and the like after the user logs in the online education platform. The original behavior data can also be online test scores, past test scores and the like.

And step two, performing first classification and aggregation on the behavior characteristics of all users in the service area by using the same specific characteristics as standards through a lower-level server to form a plurality of cluster objects, wherein the same specific characteristics are selected by using a Pearson correlation coefficient as a standard. Using the same specific characteristic as the cluster identification of the corresponding cluster object; and simultaneously removing the same specific characteristics of all users in the cluster object to form a first cluster object. When the behavior characteristics of the user simultaneously contain the same specific characteristics of more than two cluster objects, the weight value calculation is carried out on the same specific characteristics of the cluster objects, and the same specific characteristics with large weight values are taken as priorities to carry out classification and aggregation.

For example, user 1's behavior signature is A, B, C, D, E, F, user 2's behavior signature is A, C, D, Q, H, user 3's behavior signature is A, B, E, Q, H, and the same particular signature may be one or more particular signatures. As user 1, user 2, with A, C as a particular trait, may form a first cluster object identified as an AC, with the remaining behavioral traits for user 1 in the first cluster object comprising B, D, E, F and the remaining behavioral traits for user 2 comprising D, Q, H.

As with USER-2, USER-3, with Q, H as a particular trait, a first cluster object identified as QH may be formed, with the remaining behavioral traits for USER-2 in the first cluster object comprising A, C, D and the remaining behavioral traits for USER-3 comprising A, B, E.

If two cluster objects are classified and aggregated by taking the AC and the QH as the same specific features, the behavior feature of the user 2 includes both the AC and the QH, at this time, it is necessary to calculate the weighted value of A, C and calculate the average value, and calculate the weighted value of Q, H and calculate the average value, if the average value of the AC is large, the user 2 is integrated into the first cluster object identified by the AC, and otherwise, the user 2 is integrated into the first cluster object identified by the QH.

Step three, fusing all first cluster objects with the same cluster identifier in the subordinate server through the superior server to form a new first cluster object; performing secondary classification and aggregation by taking the same specific characteristics as standards to form a plurality of new cluster objects; adding the same specific characteristics into the first cluster identifier of the corresponding new cluster object to form a second cluster identifier; and removing the same specific characteristics of all users in the new cluster object to form a second cluster object.

And the superior server performs secondary fusion on all the second cluster objects with the same second cluster identifier to form a new second cluster object.

Assume that one of the subordinate servers forms the AC as the identified first cluster object (AC, User 1-B, D, E, F, User 2-D, Q, H) and the QH as the identified first cluster object (QH, User 3-B, D, E, F, User 4-B, D). Another subordinate server forms the AC as an identified first cluster object (AC, Users 5-B, D, E, F, Users 6-B, D, Q, H) and the BD as an identified first cluster object (BD, Users 7-A, C, E, F, Q, H, Users 8-A, C, Q, H).

The four first cluster objects are fused for the first time to form a new first cluster object identified by the AC (AC, users 1-B, D, E, F, users 2-D, Q, H, users 5-B, D, E, F, and users 6-B, D, Q, H), a QH (QH, users 3-B, D, E, F, users 4-B, D) and a BD (BD, users 7-A, C, E, F, Q, H, and users 8-A, C, Q, H).

And classifying and aggregating the first cluster object with the QH as the identifier by taking the BD as the same specific feature to obtain a second cluster object with the QHBD as the identifier (QHBD, users 3-E, F and user 4-none), and classifying and aggregating the first cluster object with the BD as the identifier by taking the QH as the same specific feature to obtain a second cluster object with the BDQH as the identifier (BDQH, users 7-A, C, E, F and users 8-A, C). At this time, the two second cluster objects have the same aggregation identification, so performing the second fusion results in a new second cluster object identified as BDQH (QHBD, user 3-E, F, user 4-none, user 7-A, C, E, F, user 8-A, C).

And step four, the upper server in the step three is used as a lower server to repeatedly carry out the operation in the step three until the cluster object forms the cluster identifier. And when the cluster object completes the formation of the cluster identifier, stopping the aggregation classification of the corresponding user. For example, the new second cluster object identified as BDQH in step three (QHBD, Users 3-E, F, Users 4-none, Users 7-A, C, E, F, Users 8-A, C), where user 4 has no behavioral characteristics, so user 4 is labeled as QHBD cluster object and user 4 is removed to get the updated second cluster object (QHBD, Users 3-E, F, Users 7-A, C, E, F, Users 8-A, C).

And step five, counting and storing the user information and the corresponding cluster identifications after the clustering is finished. Such as user 4 labeled QHBD cluster object in step four, at which time the online platform may make an accurate recommendation of learning resources to user 4.

Example 2: a distributed student clustering integration system, as shown in fig. 2, for implementing the distributed student clustering integration method in embodiment 1, includes a user terminal, a database, and at least two levels of servers, where a plurality of user terminals are provided in a service area to which lower levels of servers belong, and a plurality of lower levels of servers are provided in a service area to which upper levels of servers belong.

And the user terminal is used for acquiring the original behavior data of the user, analyzing the original behavior data through a naive Bayes model, and extracting the behavior characteristics.

The lower-level server is used for carrying out first classification and aggregation on the behavior characteristics of all users in the service area to which the users belong by taking the same specific characteristics as standards to form a plurality of cluster objects, and the same specific characteristics are selected by taking a Pearson correlation coefficient as a standard; using the same specific characteristic as the cluster identification of the corresponding cluster object; and simultaneously removing the same specific characteristics of all users in the cluster object to form a first cluster object.

The superior server is used for fusing all the first cluster objects with the same cluster identifier in the subordinate server to form a new first cluster object; performing secondary classification and aggregation by taking the same specific characteristics as standards to form a plurality of new cluster objects; adding the same specific characteristics into the first cluster identifier of the corresponding new cluster object to form a second cluster identifier; removing the same specific characteristics of each user in the new cluster object to form a second cluster object; repeatedly performing classification and aggregation by taking the superior server as the inferior server until the cluster object forms a cluster identifier;

and the database is used for counting and storing the user information completing the clustering set and the corresponding clustering identification.

The user terminal also comprises a principal component analysis module, and the principal component analysis module eliminates dimension reduction on the non-associated data of the original behavior data to obtain associated original behavior data.

The superior server is also used for performing secondary fusion on all the second cluster objects with the same second cluster identifier to form a new second cluster object.

The server also comprises a weight calculation module; when the behavior characteristics of the user simultaneously contain the same specific characteristics of more than two cluster objects, the weight calculation module carries out weight value calculation on the same specific characteristics of the cluster objects, and the same specific characteristics with large weight values are taken as priorities to carry out classification and aggregation.

The servers are all in communication connection with the database. And when the cluster object completes the formation of the cluster identifier, stopping the aggregation classification of the corresponding user.

The present embodiment is only for explaining the present invention, and it is not limited to the present invention, and those skilled in the art can make modifications of the present embodiment without inventive contribution as needed after reading the present specification, but all of them are protected by patent law within the scope of the claims of the present invention.

Claims

1. A distributed student clustering integration method is characterized by comprising the following steps:

s3: fusing all first cluster objects with the same cluster identifier in the subordinate server through the superior server to form a new first cluster object; performing secondary classification and aggregation by taking the same specific characteristics as standards to form a plurality of new cluster objects; adding the same specific characteristics into the first cluster identifier of the corresponding new cluster object to form a second cluster identifier; removing the same specific characteristics of each user in the new cluster object to form a second cluster object;

2. The distributed student clustering method according to claim 1 wherein in step S1 the raw behavior data is subjected to non-associative data elimination and dimensionality reduction by PCA principal component analysis to obtain associated raw behavior data.

3. The distributed student clustering method according to claim 1, wherein in step S3, the upper server performs secondary fusion on all second cluster objects with the same second cluster identifier to form a new second cluster object.

4. The distributed student clustering method according to claim 1, wherein when the behavior features of the user include the same specific features of more than two cluster objects at the same time, the same specific features of the cluster objects are weighted, and the same specific features with higher weights are classified and aggregated as priorities.

5. The distributed student clustering method of claim 1 wherein clustering classification of corresponding users is stopped after clustering identification formation of the cluster objects is completed.

6. A distributed student clustering integrated system is characterized by comprising user terminals, a database and at least two levels of servers, wherein a plurality of user terminals are arranged in a service area to which lower level servers belong;

7. The distributed student clustering integration system of claim 6, wherein the user terminal further comprises a principal component analysis module, and the principal component analysis module eliminates dimensionality reduction on non-associated data of the original behavior data to obtain associated original behavior data.

8. The distributed student clustering integration system of claim 6 wherein the superior server is further configured to perform a second fusion on all second cluster objects with the same second cluster identifier to form a new second cluster object.

9. The distributed student clustering integration system of claim 6 wherein the server further comprises a weight calculation module; when the behavior characteristics of the user simultaneously contain the same specific characteristics of more than two cluster objects, the weight calculation module carries out weight value calculation on the same specific characteristics of the cluster objects, and the same specific characteristics with large weight values are taken as priorities to carry out classification and aggregation.

10. The distributed student clustering integration system of claim 6 wherein the servers are each communicatively coupled to a database; and when the cluster object completes the formation of the cluster identifier, stopping the aggregation classification of the corresponding user.