CN112712115A - Network user group division method and system - Google Patents

Network user group division method and system Download PDF

Info

Publication number
CN112712115A
CN112712115A CN202011601614.7A CN202011601614A CN112712115A CN 112712115 A CN112712115 A CN 112712115A CN 202011601614 A CN202011601614 A CN 202011601614A CN 112712115 A CN112712115 A CN 112712115A
Authority
CN
China
Prior art keywords
clustering
user data
user
data samples
graph
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011601614.7A
Other languages
Chinese (zh)
Inventor
杜航原
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanxi University
Original Assignee
Shanxi University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanxi University filed Critical Shanxi University
Priority to CN202011601614.7A priority Critical patent/CN112712115A/en
Publication of CN112712115A publication Critical patent/CN112712115A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Systems or methods specially adapted for specific business sectors, e.g. utilities or tourism
    • G06Q50/01Social networking

Abstract

The invention discloses a network user group division method and a system, wherein the method comprises the following steps: acquiring user data samples corresponding to user groups to be divided, and clustering the user data samples by adopting a preset clustering algorithm to obtain a basic clustering division result of the user data samples; calculating a similarity matrix of the user data samples based on the base cluster partitioning result; obtaining graph data representation corresponding to the user data sample based on the similarity matrix; and carrying out clustering integration on the graph data representation based on a graph neural network to obtain a user group division result. The invention excavates the relation between the network users by combining the basic clustering division, and utilizes the graph neural network to carry out the clustering integration task, thereby improving the accuracy of the group division result of the network users.

Description

Network user group division method and system
Technical Field
The invention relates to the field of data mining, in particular to a network user group division method and a network user group division system.
Background
As a broadcast network platform, the microblog provides wide sharing and communication space for users, and the microblog has huge users by virtue of real-time, concise and open characteristics. Data show that the number of active users in the microblog reaches 4.62 hundred million in 2018, the growth exceeds 7000 ten thousand in three consecutive years, the number of vertical fields of the microblog is enlarged to 60, and the monthly reading amount reaches 32 fields beyond one billion. In the face of an increasing user group, how a microblog operator provides more accurate service for users is a problem to be solved at present. Massive data generated by a microblog user on a platform contains rich user behavior information, and a user group with similar interest and preference is found through analysis and research on user data, so that support can be provided for optimizing personalized service of the microblog platform.
At present, a method for partitioning microblog users mainly adopts a single clustering algorithm, and the single clustering algorithm has defects on the partition reliability and stability of the users; on the other hand, the clustering algorithms do not fully mine the relationship among the microblog users, so that the partitioning result of the users is not ideal.
Disclosure of Invention
The invention provides a network user group division method and a system, which aim to solve the technical problems that the existing network user group division method does not fully excavate the similarity between user data samples, and a single clustering algorithm is insufficient in the division reliability and stability of a network user group.
In order to solve the technical problems, the invention provides the following technical scheme:
in one aspect, the present invention provides a method for dividing a network user group, including:
acquiring user data samples corresponding to user groups to be divided, and clustering the user data samples by adopting a preset clustering algorithm to obtain a basic clustering division result of the user data samples;
calculating a similarity matrix of the user data samples based on the base cluster partitioning result;
obtaining graph data representation corresponding to the user data sample based on the similarity matrix;
and carrying out clustering integration on the graph data representation based on a graph neural network to obtain a user group division result.
The method for clustering the user data samples by adopting the preset clustering algorithm to obtain the base clustering division result of the user data samples comprises the following steps:
selecting the number of categories to which the user data samples are to be clustered;
and clustering the user data samples by adopting a plurality of preset different clustering algorithms according to the category number to obtain a base clustering division result of the user data samples.
Wherein, based on the base cluster division result, calculating a similarity matrix of the user data samples comprises:
calculating a similarity matrix of the user data sample by adopting a weighted connected triple algorithm based on the base clustering division result; wherein the weighted connected triplet algorithm comprises the following steps:
calculating the similarity between the intersected clusters in the base cluster partitioning result;
calculating the similarity between the disjoint clusters in the base cluster partitioning result;
and calculating to obtain a similarity matrix between the user data samples based on the similarity between the intersected clusters in the basic clustering partitioning result and the similarity between the intersected clusters in the basic clustering partitioning result.
Obtaining graph data representation corresponding to the user data sample based on the similarity matrix, wherein the graph data representation comprises:
and taking the similarity matrix as an adjacency matrix to express the adjacency relation among the user data samples, and transforming the data representation of the user data samples in the feature space into corresponding graph data representation.
The clustering integration of the graph data representation based on the graph neural network to obtain the user population division result comprises the following steps:
learning low-dimensional embedding of the graph data representation using a preset graph autoencoder;
clustering the low-dimensional embedding by adopting a K mean value clustering algorithm to obtain an initial clustering center;
calculating the likelihood distribution of the low-dimensional embedding according to the low-dimensional embedding and the clustering center;
calculating the target distribution of the low-dimensional embedding according to the likelihood distribution;
and the likelihood distribution supervises the clustering integration process, and simultaneously guides the learning process of low-dimensional embedding through a clustering integration target to form a clustering integrated self-supervision optimization model so as to obtain a user group division result.
In another aspect, the present invention further provides a network user group partitioning system, including:
the base clustering module is used for acquiring user data samples corresponding to user groups to be partitioned, and clustering the user data samples by adopting a preset clustering algorithm to obtain base clustering partitioning results of the user data samples;
the similarity calculation module is used for calculating a similarity matrix of the user data sample based on the base clustering division result obtained by the base clustering module;
the graph data representation module is used for obtaining graph data representation corresponding to the user data sample based on the similarity matrix of the user data sample calculated by the similarity calculation module;
and the graph neural network clustering integration module is used for clustering and integrating the graph data representation obtained by the graph data representation module based on the graph neural network to obtain a user group division result.
Wherein the base clustering module is specifically configured to:
selecting the number of categories to which the user data samples are to be clustered;
and clustering the user data samples by adopting a plurality of preset different clustering algorithms according to the category number to obtain a base clustering division result of the user data samples.
Wherein the similarity calculation module is specifically configured to:
calculating a similarity matrix of the user data sample by adopting a weighted connected triple algorithm based on the base clustering division result; wherein the weighted connected triplet algorithm comprises the following steps:
calculating the similarity between the intersected clusters in the base cluster partitioning result;
calculating the similarity between the disjoint clusters in the base cluster partitioning result;
and calculating to obtain a similarity matrix between the user data samples based on the similarity between the intersected clusters in the basic clustering partitioning result and the similarity between the intersected clusters in the basic clustering partitioning result.
Wherein the graph data representation module is specifically configured to:
and taking the similarity matrix as an adjacency matrix to express the adjacency relation among the user data samples, and transforming the data representation of the user data samples in the feature space into corresponding graph data representation.
The graph neural network clustering integration module is specifically used for:
learning low-dimensional embedding of the graph data representation using a preset graph autoencoder;
clustering the low-dimensional embedding by adopting a K mean value clustering algorithm to obtain an initial clustering center;
calculating the likelihood distribution of the low-dimensional embedding according to the low-dimensional embedding and the clustering center;
calculating the target distribution of the low-dimensional embedding according to the likelihood distribution;
and the likelihood distribution supervises the clustering integration process, and simultaneously guides the learning process of low-dimensional embedding through a clustering integration target to form a clustering integrated self-supervision optimization model so as to obtain a user group division result.
In yet another aspect, the present invention also provides an electronic device comprising a processor and a memory; wherein the memory has stored therein at least one instruction that is loaded and executed by the processor to implement the above-described method.
In yet another aspect, the present invention also provides a computer-readable storage medium having at least one instruction stored therein, the instruction being loaded and executed by a processor to implement the above method.
The technical scheme provided by the invention has the beneficial effects that at least:
the invention adopts a clustering integration framework based on a graph neural network to carry out clustering integration analysis on user data. The graph data representation of the base cluster obtained by processing the user data completely reflects the global similarity relation of the user data samples; the present invention uses graph neural networks that are more advantageous for processing graph data; the self-supervision model enables information transmission and data mapping in the graph automatic encoder to obey a final clustering integration target, a better network user partition result can be obtained, and the accuracy of the network group user partition result is improved, so that support is provided for network operators to better optimize personalized services and promote marketing benefits.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
Fig. 1 is a flowchart of a network user group division method according to a first embodiment of the present invention;
fig. 2 is a flowchart of a network user group division method according to a second embodiment of the present invention;
fig. 3 is a diagram of a clustering integration process based on a graph neural network according to a second embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, embodiments of the present invention will be described in detail with reference to the accompanying drawings.
First embodiment
The embodiment provides a network user group division method, which has the core idea that the relation among network users is mined by combining with the basic clustering division, and a clustering integration task is carried out by utilizing a graph neural network, so that the accuracy of the network user group division result is improved. The method may be implemented by an electronic device, which may be a terminal or a server. The execution flow of the method is shown in fig. 1, and comprises the following steps:
s101, obtaining user data samples corresponding to user groups to be divided, and clustering the user data samples by adopting a preset clustering algorithm to obtain a basic clustering division result of the user data samples;
specifically, in this embodiment, the foregoing S101 may include the following processes:
selecting the number of categories to be clustered by the user data samples; and clustering the user data samples by adopting a plurality of preset different clustering algorithms so as to obtain a base clustering division result of the user data samples.
S102, calculating a similarity matrix of the user data samples based on the base cluster division result;
specifically, in this embodiment, the S102 may adopt a weighted connected triplet algorithm WCT to calculate a similarity matrix of the user data samples, where the WCT algorithm includes the following steps:
calculating the similarity between the intersected clusters in the base clustering division result;
calculating the similarity between the disjoint clusters in the base clustering partitioning result;
and calculating to obtain a similarity matrix between the user data samples based on the similarity between the intersected clusters in the basic clustering partitioning result and the similarity between the intersected clusters in the basic clustering partitioning result.
S103, obtaining graph data representation corresponding to the user data sample based on the similarity matrix;
specifically, in this embodiment, the step S103 may include the following steps:
and taking the similarity matrix as an adjacency matrix to express the adjacency relation among the user data samples, and transforming the data representation of the user data samples in the feature space into corresponding graph data representation.
And S104, carrying out clustering integration on the graph data based on the graph neural network to obtain a user group division result.
Specifically, in this embodiment, the step S104 may include the following steps:
learning the low-dimensional embedding of the graph data representation obtained in the last step by using a preset graph automatic encoder;
clustering the low-dimensional embedding by adopting a K mean value clustering algorithm to obtain an initial clustering center;
calculating the likelihood distribution of the low-dimensional embedding according to the low-dimensional embedding and the clustering center;
calculating the target distribution of the low-dimensional embedding according to the likelihood distribution;
a clustering integration process is supervised by likelihood distribution, and a learning process of low-dimensional embedding is guided by a clustering integration target to form a clustering integrated self-supervision optimization model, so that a clustering integration result is optimized.
In the embodiment, the user data samples are clustered by adopting a preset clustering algorithm to obtain a base clustering division result of the user data samples; calculating a similarity matrix of the user data samples based on the base clustering division result; obtaining graph data representation corresponding to the user data sample based on the similarity matrix; and carrying out clustering integration on the graph data based on the graph neural network to obtain a user group division result. The accuracy of the network group user partition result is improved, and support is provided for network operators to better optimize personalized services and promote marketing benefits.
Second embodiment
In this embodiment, the network user group division method is used for the microblog user data sample set, the cluster integration analysis is performed on the microblog user data sample set, and the grouping division is performed on the microblog users according to the cluster integration result, so that the personalized service of the microblog operators on the users is facilitated. The method may be implemented by an electronic device, which may be a terminal or a server. The execution flow of the network user group division method is shown in fig. 2, and comprises the following steps:
s101, collecting microblog user information, extracting data characteristics, and entering S102;
specifically, in this embodiment, the steps specifically include: capturing information of microblog users by using a web crawler tool, wherein the captured information of the microblog users comprises user basic information and microblog account information; the microblog account information comprises microblog account names, microblog authentication, profiles, fan number and attention number in a user attention list.
S102, carrying out data inspection and pretreatment on user information data, and then entering S103;
specifically, the steps described above in this embodiment include the following steps:
s1021, data checking
Before clustering integrated analysis is carried out, firstly, whether a selected data sample can represent the whole is determined, three indexes of gender, age and area are selected, and the data sample is compared with standard data;
s1022, user filtering
In the crawled microblog users, silent users exist, and the users are mainly characterized in that the number of microblog accounts and the number of issued microblogs in an attention list of the users are small, so that the interest and preference of the users cannot be truly reflected, and the users need to be removed. In this respect, in this embodiment, the number of microblog account interests is smaller than one tenth of the mean value of the number of microblog interest of all the microblog users, and the microblog users who send out the number of microblog accounts smaller than ten are marked as "silent users", and are removed from the data table;
s1023, classifying the account numbers concerned by the microblog users
Account numbers of different categories are identified by using 'introduction' and 'authentication' fields in microblog account numbers concerned by microblog users, and account numbers of concerned lists are classified. According to the embodiment, the microblog account concerned by the user is divided into friends, famous people and functional microblogs according to a mainstream classification mode. The microblog of the friend means the microblog of a person close to the microblog user; the microblog of the known person refers to a microblog account number of the representative known person in a certain field; the functional microblog is a microblog account with a certain social function, and is generally an official authentication account of each industry, a consultation account of news media and the like;
s1024, representation of interest of microblog users
The representation of the interest of the microblog user comprises the steps of determining an interest set, removing an invalid account number and mapping the interest set; the interest set is determined by classifying the interests of microblog users by referring to a classification system of a mainstream microblog platform and the field classification of the microblade V to form an interest set; the invalid account number is removed, namely account numbers of microblog friends which are concerned by the user and cannot reflect the interests and hobbies of the user are removed, and account numbers which can obviously reflect the interests of the user are filtered out; mapping the interest set means that there is always one interest in the interest set, so that any account in the account set corresponds to the interest. The accounts similar to the functional microblog reflect interest preferences of the same category of users, and the accounts need to be integrated and classified. Specifically, through mainstream classification of the current network, the interest of the microblog users is classified into the following categories: fashion shopping, food, travel photography, sports, movie entertainment, music, game animation, literature reading, industrial work, and IT digital.
S103, clustering the preprocessed data to obtain a base cluster, and then entering S104;
specifically, in this embodiment, the steps specifically include: selecting the number K of the categories to be clustered, and clustering the data samples by adopting several common different clustering algorithms to obtain the base clustering division of the data samples.
S104, calculating the similarity between users to obtain a similarity matrix of the users, and then entering S105;
specifically, in this embodiment, the steps specifically include: calculating the similarity between users by adopting a WCT algorithm to obtain a similarity matrix of the users; the WCT algorithm mainly comprises the following steps:
calculating the similarity between the intersected clusters in the base clusters of the obtained data samples; calculating the similarity between the disjoint clusters in the base cluster; the similarity between the user data samples is calculated.
S105, obtaining graph data representation of the user according to the similarity matrix, and then entering S106;
specifically, in this embodiment, the steps specifically include: and taking the similarity matrix as an adjacency matrix to express the adjacency relation among the user data samples, and transforming the data representation of the user data samples in the feature space into corresponding graph data representation. Therefore, the relation among the microblog user data is completely reflected.
S106, learning low-dimensional embedding represented by graph data by utilizing a graph automatic encoder, and then entering S107;
s107, clustering the low-dimensional embedding by adopting a K-means clustering algorithm to obtain an initial clustering center, and then entering S108;
s108, calculating likelihood distribution according to the low-dimensional embedding and clustering center, and then entering S109;
s109, calculating target distribution according to the likelihood distribution, and then entering S110;
s110, minimizing a loss function, and then entering S111;
s111, judging whether a set threshold is reached, if so, entering S112, and if not, entering S108;
and S112, outputting a group division result of the microblog user.
The above steps S106 to S112 may be summarized as: as shown in fig. 3, an improved graph neural network clustering integration frame is used for clustering and integrating user data, a clustering integration target is used for guiding a low-dimensional embedding learning process, and a clustering integrated self-monitoring optimization model is formed, so that a clustering integration result is optimized, a clustering integration result is obtained after iteration is completed, and the clustering integration result is an optimal user partition result of a microblog network.
In summary, the method of the embodiment generates the graph data representation of the microblog users based on the existing base clustering, completely reflects the global similarity relation of the samples, uses the graph neural network which is more advantageous for processing the graph data with missing attributes, and enables information transfer and data mapping in the graph automatic encoder to comply with the final clustering integration target by the self-supervision model, so that the generated low-dimensional embedding is beneficial to obtaining the optimal microblog user group division result. Better microblog user partition results can be obtained, and the accuracy of the microblog group user partition results is improved, so that support is provided for microblog operators to better optimize personalized services and promote marketing benefits.
Third embodiment
The embodiment provides a network user group division system, which comprises the following modules:
the base clustering module is used for acquiring user data samples corresponding to user groups to be partitioned, and clustering the user data samples by adopting a preset clustering algorithm to obtain base clustering partitioning results of the user data samples;
the similarity calculation module is used for calculating a similarity matrix of the user data sample based on the base clustering division result obtained by the base clustering module;
the graph data representation module is used for obtaining graph data representation corresponding to the user data sample based on the similarity matrix of the user data sample calculated by the similarity calculation module;
and the graph neural network clustering integration module is used for clustering and integrating the graph data representation obtained by the graph data representation module based on the graph neural network to obtain a user group division result.
The network user group division system of the present embodiment corresponds to the network user group division method of the first embodiment described above; the functions realized by each functional module in the network user group division system correspond to each flow step in the network user group division method one by one; therefore, it is not described herein.
Fourth embodiment
The present embodiment provides an electronic device, which includes a processor and a memory; wherein the memory stores at least one instruction, and the instruction is loaded and executed by the processor to implement the method of the above embodiment.
The electronic device may generate a large difference due to different configurations or performances, and may include one or more processors (CPUs) and one or more memories, where at least one instruction is stored in the memory, and the instruction is loaded by the processor and performs the following steps:
s101, obtaining user data samples corresponding to user groups to be divided, and clustering the user data samples by adopting a preset clustering algorithm to obtain a basic clustering division result of the user data samples;
s102, calculating a similarity matrix of the user data samples based on the base cluster division result;
s103, obtaining graph data representation corresponding to the user data sample based on the similarity matrix;
and S104, carrying out clustering integration on the graph data based on the graph neural network to obtain a user group division result.
The electronic equipment of the embodiment performs cluster analysis on the user data samples to obtain a base cluster division result of the user data samples; calculating a similarity matrix of the user data samples based on the base clustering division result; obtaining graph data representation corresponding to the user data sample based on the similarity matrix; and carrying out clustering integration on the graph data based on the graph neural network to obtain a user group division result. The accuracy of the network group user partition result is improved, and therefore support is provided for network operators to better optimize personalized services and promote marketing benefits.
Fifth embodiment
The present embodiments provide a computer-readable storage medium having at least one instruction stored therein, the instruction being loaded and executed by a processor to implement the above-mentioned method. The computer readable storage medium may be, among others, ROM, Random Access Memory (RAM), CD-ROM, magnetic tape, floppy disk, optical data storage device, and the like. The instructions stored therein may be loaded by a processor in the terminal and perform the steps of:
s101, obtaining user data samples corresponding to user groups to be divided, and clustering the user data samples by adopting a preset clustering algorithm to obtain a basic clustering division result of the user data samples;
s102, calculating a similarity matrix of the user data samples based on the base cluster division result;
s103, obtaining graph data representation corresponding to the user data sample based on the similarity matrix;
and S104, carrying out clustering integration on the graph data based on the graph neural network to obtain a user group division result.
In the program method stored in the computer-readable storage medium of this embodiment, a base clustering partitioning result of a user data sample is obtained by obtaining the user data sample corresponding to a user group to be partitioned and clustering the user data sample; calculating a similarity matrix of the user data samples based on the base clustering division result; obtaining graph data representation corresponding to the user data sample based on the similarity matrix; and carrying out clustering integration on the graph data based on the graph neural network to obtain a user group division result. The accuracy of the network group user partition result is improved, and therefore support is provided for network operators to better optimize personalized services and promote marketing benefits.
Furthermore, it should be noted that the present invention may be provided as a method, apparatus or computer program product. Accordingly, embodiments of the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, embodiments of the present invention may take the form of a computer program product embodied on one or more computer-usable storage media having computer-usable program code embodied in the medium.
Embodiments of the present invention are described with reference to flowchart illustrations and/or block diagrams of methods, terminal devices (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, embedded processor, or other programmable data processing terminal to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing terminal, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing terminal to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks. These computer program instructions may also be loaded onto a computer or other programmable data processing terminal to cause a series of operational steps to be performed on the computer or other programmable terminal to produce a computer implemented process such that the instructions which execute on the computer or other programmable terminal provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
It should also be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or terminal that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or terminal. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or terminal that comprises the element.
Finally, it should be noted that while the above describes a preferred embodiment of the invention, it will be appreciated by those skilled in the art that, once the basic inventive concepts have been learned, numerous changes and modifications may be made without departing from the principles of the invention, which shall be deemed to be within the scope of the invention. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all such alterations and modifications as fall within the scope of the embodiments of the invention.

Claims (10)

1. A method for dividing a network user group is characterized by comprising the following steps:
acquiring user data samples corresponding to user groups to be divided, and clustering the user data samples by adopting a preset clustering algorithm to obtain a basic clustering division result of the user data samples;
calculating a similarity matrix of the user data samples based on the base cluster partitioning result;
obtaining graph data representation corresponding to the user data sample based on the similarity matrix;
and carrying out clustering integration on the graph data representation based on a graph neural network to obtain a user group division result.
2. The method of claim 1, wherein clustering the user data samples using a predetermined clustering algorithm to obtain a base cluster partitioning result of the user data samples comprises:
selecting the number of categories to which the user data samples are to be clustered;
and clustering the user data samples by adopting a plurality of preset different clustering algorithms according to the category number to obtain a base clustering division result of the user data samples.
3. The method of claim 1, wherein said calculating a similarity matrix of said user data samples based on said base cluster partition results comprises:
calculating a similarity matrix of the user data sample by adopting a weighted connected triple algorithm based on the base clustering division result; wherein the weighted connected triplet algorithm comprises the following steps:
calculating the similarity between the intersected clusters in the base cluster partitioning result;
calculating the similarity between the disjoint clusters in the base cluster partitioning result;
and calculating to obtain a similarity matrix between the user data samples based on the similarity between the intersected clusters in the basic clustering partitioning result and the similarity between the intersected clusters in the basic clustering partitioning result.
4. The method for dividing a network user group according to claim 1, wherein the obtaining of the graph data representation corresponding to the user data sample based on the similarity matrix comprises:
and taking the similarity matrix as an adjacency matrix to express the adjacency relation among the user data samples, and transforming the data representation of the user data samples in the feature space into corresponding graph data representation.
5. The method according to claim 1, wherein the clustering integration of the graph data representation based on the graph neural network to obtain the user population partitioning result comprises:
learning low-dimensional embedding of the graph data representation using a preset graph autoencoder;
clustering the low-dimensional embedding by adopting a K mean value clustering algorithm to obtain an initial clustering center;
calculating the likelihood distribution of the low-dimensional embedding according to the low-dimensional embedding and the clustering center;
calculating the target distribution of the low-dimensional embedding according to the likelihood distribution;
and the likelihood distribution supervises the clustering integration process, and simultaneously guides the learning process of low-dimensional embedding through a clustering integration target to form a clustering integrated self-supervision optimization model so as to obtain a user group division result.
6. A network user population partitioning system, said system comprising:
the base clustering module is used for acquiring user data samples corresponding to user groups to be partitioned, and clustering the user data samples by adopting a preset clustering algorithm to obtain base clustering partitioning results of the user data samples;
the similarity calculation module is used for calculating a similarity matrix of the user data sample based on the base clustering division result obtained by the base clustering module;
the graph data representation module is used for obtaining graph data representation corresponding to the user data sample based on the similarity matrix of the user data sample calculated by the similarity calculation module;
and the graph neural network clustering integration module is used for clustering and integrating the graph data representation obtained by the graph data representation module based on the graph neural network to obtain a user group division result.
7. The system for network user population partitioning of claim 6, wherein said base clustering module is specifically configured to:
selecting the number of categories to which the user data samples are to be clustered;
and clustering the user data samples by adopting a plurality of preset different clustering algorithms according to the category number to obtain a base clustering division result of the user data samples.
8. The system for partitioning a population of network users of claim 6, wherein the similarity calculation module is specifically configured to:
calculating a similarity matrix of the user data sample by adopting a weighted connected triple algorithm based on the base clustering division result; wherein the weighted connected triplet algorithm comprises the following steps:
calculating the similarity between the intersected clusters in the base cluster partitioning result;
calculating the similarity between the disjoint clusters in the base cluster partitioning result;
and calculating to obtain a similarity matrix between the user data samples based on the similarity between the intersected clusters in the basic clustering partitioning result and the similarity between the intersected clusters in the basic clustering partitioning result.
9. The system for partitioning a population of network users of claim 6, wherein said graph data representation module is specifically configured to:
and taking the similarity matrix as an adjacency matrix to express the adjacency relation among the user data samples, and transforming the data representation of the user data samples in the feature space into corresponding graph data representation.
10. The network user population partitioning system of claim 6, wherein said graph neural network clustering integration module is specifically configured to:
learning low-dimensional embedding of the graph data representation using a preset graph autoencoder;
clustering the low-dimensional embedding by adopting a K mean value clustering algorithm to obtain an initial clustering center;
calculating the likelihood distribution of the low-dimensional embedding according to the low-dimensional embedding and the clustering center;
calculating the target distribution of the low-dimensional embedding according to the likelihood distribution;
and the likelihood distribution supervises the clustering integration process, and simultaneously guides the learning process of low-dimensional embedding through a clustering integration target to form a clustering integrated self-supervision optimization model so as to obtain a user group division result.
CN202011601614.7A 2020-12-29 2020-12-29 Network user group division method and system Pending CN112712115A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011601614.7A CN112712115A (en) 2020-12-29 2020-12-29 Network user group division method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011601614.7A CN112712115A (en) 2020-12-29 2020-12-29 Network user group division method and system

Publications (1)

Publication Number Publication Date
CN112712115A true CN112712115A (en) 2021-04-27

Family

ID=75546841

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011601614.7A Pending CN112712115A (en) 2020-12-29 2020-12-29 Network user group division method and system

Country Status (1)

Country Link
CN (1) CN112712115A (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108108687A (en) * 2017-12-18 2018-06-01 苏州大学 A kind of handwriting digital image clustering method, system and equipment
CN109447833A (en) * 2018-09-26 2019-03-08 江苏大学 A kind of extensive microblog users community of interest discovery method
CN111464529A (en) * 2020-03-31 2020-07-28 山西大学 Network intrusion detection method and system based on cluster integration
CN111726765A (en) * 2020-05-29 2020-09-29 山西大学 WIFI indoor positioning method and system for large-scale complex scene

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108108687A (en) * 2017-12-18 2018-06-01 苏州大学 A kind of handwriting digital image clustering method, system and equipment
CN109447833A (en) * 2018-09-26 2019-03-08 江苏大学 A kind of extensive microblog users community of interest discovery method
CN111464529A (en) * 2020-03-31 2020-07-28 山西大学 Network intrusion detection method and system based on cluster integration
CN111726765A (en) * 2020-05-29 2020-09-29 山西大学 WIFI indoor positioning method and system for large-scale complex scene

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
LIANG BAI 等: "An Information-Theoretical Framework for Cluster Ensemble", 《IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING》 *
杜航原 等: "一种深度自监督聚类集成算法", 《智能系统学报》 *
杜航原 等: "基于网络节点中心性度量的重叠社区发现算法", 《计算机研究与发展》 *

Similar Documents

Publication Publication Date Title
CN103793484B (en) The fraud identifying system based on machine learning in classification information website
CN111614690B (en) Abnormal behavior detection method and device
WO2018103718A1 (en) Application recommendation method and apparatus, and server
CN108108743B (en) Abnormal user identification method and device for identifying abnormal user
CN107071193B (en) Method and device for accessing interactive response system to user
CN110837862B (en) User classification method and device
CN107862022B (en) Culture resource recommendation system
CN105550583A (en) Random forest classification method based detection method for malicious application in Android platform
US11562179B2 (en) Artificial intelligence system for inspecting image reliability
CN106682686A (en) User gender prediction method based on mobile phone Internet-surfing behavior
CN106846082B (en) Travel cold start user product recommendation system and method based on hardware information
CN111797320B (en) Data processing method, device, equipment and storage medium
CN113312531A (en) User portrait identification method based on DPI analysis and decision tree model
CN112202849A (en) Content distribution method, content distribution device, electronic equipment and computer-readable storage medium
CN111428151B (en) False message identification method and device based on network acceleration
CN112749280A (en) Internet public opinion classification method, device, electronic device and storage medium
CN109978575B (en) Method and device for mining user flow operation scene
CN112765468A (en) Personalized user service customization method and device
US9020863B2 (en) Information processing device, information processing method, and program
CN111177500A (en) Data object classification method and device, computer equipment and storage medium
CN109145109B (en) User group message propagation abnormity analysis method and device based on social network
CN110598126B (en) Cross-social network user identity recognition method based on behavior habits
CN110992215B (en) Travel service recommendation system, database and recommendation method based on semantic analysis
CN112199388A (en) Strange call identification method and device, electronic equipment and storage medium
CN113010705A (en) Label prediction method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20210427

RJ01 Rejection of invention patent application after publication