CN111125469B - User clustering method and device of social network and computer equipment - Google Patents

User clustering method and device of social network and computer equipment Download PDF

Info

Publication number
CN111125469B
CN111125469B CN201911247467.5A CN201911247467A CN111125469B CN 111125469 B CN111125469 B CN 111125469B CN 201911247467 A CN201911247467 A CN 201911247467A CN 111125469 B CN111125469 B CN 111125469B
Authority
CN
China
Prior art keywords
clustering
cluster
user
vector
center
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201911247467.5A
Other languages
Chinese (zh)
Other versions
CN111125469A (en
Inventor
陈子忠
彭道万
夏书银
李曹枭
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chongqing University of Post and Telecommunications
Original Assignee
Chongqing University of Post and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chongqing University of Post and Telecommunications filed Critical Chongqing University of Post and Telecommunications
Priority to CN201911247467.5A priority Critical patent/CN111125469B/en
Publication of CN111125469A publication Critical patent/CN111125469A/en
Application granted granted Critical
Publication of CN111125469B publication Critical patent/CN111125469B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/906Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/01Social networking

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Business, Economics & Management (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Economics (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Evolutionary Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Human Resources & Organizations (AREA)
  • Marketing (AREA)
  • Primary Health Care (AREA)
  • Strategic Management (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Probability & Statistics with Applications (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention belongs to the field of machine learning and data mining, and particularly relates to a user clustering method and device of a social network and computer equipment; the method comprises the steps of obtaining account information of each user to be clustered in a social network, wherein the account information comprises a user ID, a user area, a user gender, a user fan number, a user attention number, a content tag and release time; vectorizing each account information in the social network respectively to form a vector data set; selecting a plurality of vectors from the vector data set as initial clustering centers respectively, clustering users according to the initial clustering centers, updating the clustering centers according to clustered results or clustered results, and continuously clustering the users according to the clustering centers; until the clustering center is not changed any more, obtaining a plurality of clustered classifications, wherein each classification at least comprises one user to be clustered; the improved initial clustering center selection algorithm and the improved clustering framework can greatly reduce the calculated amount, thereby improving the efficiency of the whole method and the whole device.

Description

User clustering method and device of social network and computer equipment
Technical Field
The invention belongs to the field of machine learning and data mining, and relates to a fast and efficient k-means clustering algorithm in a clustering problem and application thereof in a social network, in particular to a user clustering method and device of the social network and computer equipment.
Background
Social network analysis is a product of combining human social science and natural science, and researches on social networks comprise networks such as e-mails, WeChat, QQ, Sina microblog, Twitter, Facebook and the like, various objects exist in the social networks, and the objects are required to be classified. The k-means clustering algorithm is one of the most common, simple and effective algorithms in the clustering algorithm. The standard k-means clustering algorithm was independently proposed by Steinhaus in 1955, Lloyd in 1957, Ball & Hall in 1965, and McQueen in 1967 in respective different scientific research fields. Cluster analysis is a technique for statistical data analysis, and is widely used in many fields, including machine learning, data mining, pattern recognition, image analysis, and biological information.
The traditional k-means algorithm performs well in the scene of processing small-batch data, not only in efficiency, but also in clustering effect. However, in the social network, massive user data needs to be processed, so the efficiency of the clustering algorithm is very important. However, in the past, some traditional clustering algorithms are mainly used for technical support, but the traditional clustering algorithms become inefficient in a big data scene, and mainly show that the convergence speed is extremely slow, the time complexity of the algorithms is high, the algorithms are sensitive to noise and outliers, clustering results depend on initial clustering centers, and the like.
Aiming at the problems of the traditional k-means algorithm, the main improvement can be divided into the following three aspects, namely, the selection of an initial clustering center; second, is an approximate k-means; and thirdly, accelerating k-means. David Arthur et al propose a sampling-based method (D)2-sampling) initial cluster center point selection method-k-means + +. The core of the method is as follows: the first centroids are spaced as far apart as possible. This is the most widely used method of initializing the cluster center,although the method improves the defect that the standard k-means algorithm randomly selects the initial central point, the inherent orderliness of the method causes the defect of expandability of the method, so that the algorithm cannot be expanded in parallel to be applied to a super-large-scale data set. When clustering is performed on massive data, k-means approximation is a very effective method. In recent years, from different perspectives, researchers have proposed a number of approaches to k-means: storing data points in a k-d tree and maintaining a subset of candidate centers for each node of the tree eliminates computation time by avoiding comparing each point to all center points. Another approach is based on sub-sampling the data points. This method runs the k-means over the sub-sampled data points, and its extension is to add the remaining points incrementally and rerun the k-means to obtain finer clustering. The former approach is not suitable for many applications, such as clustering in social networking applications is less accurate and less performing. There are many types of k-means clustering methods for accurate acceleration, but the problems of extra time and space consumption, non-self-adaption and the like still exist in the context of mass data of a social network.
Disclosure of Invention
The invention aims to solve the efficiency problem caused by clustering mass data in a social network scene, provides a fast and efficient k-means clustering algorithm based on division in a targeted manner, and particularly provides a user clustering method and device of a social network and computer equipment.
A method of clustering users of a social network, the method comprising:
step 1: acquiring account information of each user to be clustered in the social network, wherein the account information comprises a user ID, a user area, a user gender, a user fan number, a user attention number, a content tag and release time;
and 2, step: vectorizing each account information in the social network respectively to form a vector data set;
and step 3: selecting a plurality of vectors from the vector data set as initial clustering centers respectively, clustering users according to the initial clustering centers, updating the clustering centers according to clustered results or clustered results, and continuously clustering the users according to the clustering centers; and obtaining a plurality of clustered classifications until the clustering center is not changed any more, wherein each classification at least comprises one user to be clustered.
In another aspect, the present invention further provides a user clustering device for a social network, including:
The data interface module is used for accessing account information of each user to be clustered in the social network;
the data shaping module is used for shaping the account information accessed by the data interface module to form a vector data set;
and the clustering result module is used for processing the vectors in the vector data set to obtain a plurality of clustered classifications, and each classification at least comprises one user to be clustered.
The invention has the beneficial effects that:
in the invention, under the scene of processing mass data of the social network, the user account information data of the social network is vectorized through data preprocessing, the final clustering result is more accurate by the improved initial clustering center selection method provided by the invention, and in the improved k-means frame, as the concept of a 'core domain' is defined, and respective neighbor clusters are searched for each cluster ball, all calculated quantities are limited within a small range, thus saving the calculated quantity which is required to be carried out originally and improving the efficiency of clustering the huge mass social network data. The invention can theoretically reduce the time complexity of the algorithm from O (nk) of each iteration to O (k) 2+ n). For the problem of ultra-large-scale social network user clustering, the method and the device can greatly reduce the calculated amount, and further improve the efficiency of the whole method and device.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are only a simple schematic diagram of the present invention.
FIG. 1 is an overall flow diagram of one embodiment of the present invention;
FIG. 2 is an overall flow diagram of another embodiment of the present invention;
FIG. 3 is a flow diagram of an extraction and processing of social networking data;
FIG. 4 is a detailed flow diagram of an improved k-means clustering framework in an embodiment of the present invention;
FIG. 5 is a detailed flow chart of an improved k-means clustering framework in another embodiment of the present invention;
FIG. 6 is a schematic diagram of a user clustering device of a social network according to the present invention;
FIG. 7 is a schematic diagram of a data shaping module of the present invention;
FIG. 8 is a schematic diagram of a clustering results module of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more clearly and completely apparent, the technical solutions in the embodiments of the present invention are described below with reference to the accompanying drawings, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments.
As shown in fig. 1, a method for clustering users in a social network includes:
step 1: acquiring account information of each user to be clustered in the social network, wherein the account information comprises a user ID, a user area, a user gender, a user fan number, a user attention number, a content tag and release time;
step 2: vectorizing each account information in the social network respectively to form a vector data set;
and step 3: selecting a plurality of vectors from the vector data set as initial clustering centers respectively, clustering users according to the initial clustering centers, updating the clustering centers according to clustered results or clustered results, and continuously clustering the users according to the clustering centers; and obtaining a plurality of clustered classifications until the clustering center is not changed any more, wherein each classification at least comprises one user to be clustered.
In another embodiment, as shown in fig. 2, a method for clustering users of a social network may further include:
acquiring user account information in a social network (such as a Sina microblog), wherein the user account information comprises information such as a user ID, a user region, a user gender, a user fan number, a user attention number, a microblog content tag, release time and the like;
preprocessing user information and then vectorizing the preprocessed user information;
Selecting an initial clustering center by using an improved method;
inputting all processed data sets into an improved k-means algorithm frame, and continuously iterating until the algorithm converges;
and outputting an accurate clustering result.
In one embodiment, as shown in FIG. 3, there is a flow chart for extracting and processing social networking data, the method comprises the following steps:
carrying out data cleaning on social network data, and preprocessing the characteristic data of a user;
converting the format of the data, wherein a part of data is processed by adopting a digital vector, and the other part of data is processed by adopting a semantic vector; for example, the ID of the user is normalized to limit the range between 0 and 1, the gender of the user (for example, male 1 represents and female 0 represents), the region of the user is represented by numbers (for example, Beijing 001, Shanghai 002, Chongqing 003 and the like), the semantic information is converted into a semantic vector by a sentence turning quantity sen2vec method, the semantic vector is converted into words in the semantic information by using a pre-trained word vector, all the words are weighted, and then the whole semantic information set is processed by using a principal component analysis method to obtain the vector representation of each piece of semantic information. And vectorizing the user data information on the basis of the steps.
In one embodiment, as shown in FIG. 4, it is a detailed flow chart of the improved k-means clustering framework, which includes the detailed process of the whole clustering. The method comprises the following steps:
step 301: selecting k vectors from the vector data set as an initial clustering center by using an improved initial clustering center method;
step 302: dividing all vectors into cluster balls represented by cluster centers closest to the vectors according to a nearest principle;
step 303: calculating the mean value of all vectors in each cluster ball to serve as a new clustering center, and calculating the radius of the cluster ball;
step 304: finding out the neighbor cluster of each cluster according to the distance relationship between the cluster sphere radius and the cluster center;
step 305: calculating the distance between each vector and the center of the adjacent cluster where the vector is located, and dividing the vector into cluster balls with the nearest distance according to the principle of proximity;
step 306: and repeating the steps 303 to 305 until the clustering center is not changed any more, and outputting a clustering result.
In one embodiment, the step 301 comprises:
step 3011: randomly selecting one vector data from the vector data set as a first initial clustering center;
step 3012: adopting a Markov chain with the length of 3k from the vector data set by using a Markov model Carlo method, and taking 3k data on the Markov chain as a candidate initial clustering center;
Step 3013: and (3) repeatedly combining the two closest initial clustering centers into a new initial clustering center by adopting a minimum spanning tree method of Primem for the candidate initial clustering centers in the 3k vector data sets until only k data are left as the initial clustering centers.
Of course, the data acquired by the markov model carlo method may be 4k pieces of vector data, or may be 4.5k pieces of vector data, and so on.
In one embodiment, the step 301 may further include:
randomly selecting one vector data from the vector data set in a reservoir sampling (reservoir sampling) mode as a first initial clustering center; and placing the first k vectors in the vector data set into the reservoir, replacing one vector in the reservoir with the probability of k/m for the mth element, and taking the finally selected k vectors as an initial clustering center.
In another embodiment, as shown in fig. 5, the step 3 further includes:
step 311: first, a first iteration is carried out on a data set by using a standard k-means flow, and all data are distributed to a cluster where a central point closest to the data is located:
b(xi)=argminj=1...k{(dis(xi,cj)}
cjrepresenting the center of the cluster, x iIs any one sample in the sample space.
Step 312: updating all cluster centers according to the above allocation steps:
Figure BDA0002308032080000061
| N | indicates that C is assignedjThe data sample of (1).
Step 313: calculating the radius of each cluster sphere (radius is defined as the distance between the data point farthest from the center point and the center point in the cluster sphere) defined as:
Ri=max(dis(xi,ci))for{xi|xi∈Cj}。
step 314: finding out a neighbor cluster of each cluster according to the radius information of each cluster ball and the distance relationship between any two cluster balls (the neighbor cluster is defined as a neighbor cluster if half of the distance between the central points of two clusters is smaller than the radius of the current cluster);
step 315: sequencing the adjacent clusters of each cluster from near to far;
step 316: filtering out data in a "stable domain" in each cluster sphere (a "stable domain" is defined as a set of data points that are close to the cluster center and still belong to the current cluster sphere in the current iteration);
step 317: performing nearest principle distribution on data outside the 'stable domain' in each cluster ball, wherein the range available for distribution is that of each cluster ball according to the sorted neighbor cluster in the step 315;
step 318: steps 312 through 317 are repeated until all center points no longer change.
In addition, in a specific embodiment, the vector data set of the present invention uses the user data crawled from Twitter and the user data on the xinlang microblog, which respectively include 376 ten thousand pieces of user data and 620 ten thousand pieces of user data. After all social network user data are preprocessed, clustering is carried out on the two data sets under the framework of the method provided by the invention, the result shows that the clustering is 51 times faster than that of the traditional k-means algorithm on average, and the clustering result (the evaluation index is WCSSD, namely the square sum in the cluster is better, and the smaller the square sum in the cluster is), is reduced by 3.1% compared with that of the traditional social network user clustering algorithm.
Based on the same concept of the present invention, the present invention further provides a user clustering device for a social network, as shown in fig. 6, including:
the data interface module is used for accessing account information of each user to be clustered in the social network;
the data shaping module is used for shaping the account information accessed by the data interface module to form a vector data set;
and the clustering result module is used for processing the vectors in the vector data set to obtain a plurality of clustered classifications, and each classification at least comprises one user to be clustered.
As shown in fig. 7, the data shaping module includes a digital vector generating unit and a semantic vector generating unit, the digital vector generating unit is configured to convert a part of data of the account information into a digital vector, and the semantic vector generating module is configured to convert another part of data of the account information into a semantic vector.
As shown in fig. 8, the clustering result module includes an initial clustering center selecting unit, a calculating unit, and a result calculating unit; the cluster center selection unit is used for selecting an initial cluster center in the vector data set; the calculating unit is used for calculating the radius distance of the cluster balls and the distance relation between any two cluster balls; and the result operation unit is used for dividing a clustering result according to the clustering center.
The initial clustering center selecting unit comprises a random selecting subunit, a fixed selecting subunit and a merging unit; the random selection subunit is used for randomly selecting one initial clustering center, the fixed selection subunit is used for adopting a plurality of initial clustering centers from the vector data set according to a Markov model Carlo method, and the merging unit is used for merging the two closest initial clustering centers into a new initial clustering center according to a method of a minimum spanning tree of Primum.
In addition, the invention also provides a computer device, which comprises a memory, a processor and a computer program stored on the memory and capable of being used on the processor, wherein the processor executes the program to realize the user clustering method provided by the invention.
In one embodiment of the invention, the invention employs a Python programming language and can operate on mainstream computer platforms. The operating system used in the implementation is CentOS 6.5, the CPU is required to be Intel i5, the memory is more than 8GB, and the hard disk space is required to be more than 32 GB.
It is understood that some features of the method, apparatus and computer device of the present invention may be mutually cited, and the present invention is not described in detail.
Those skilled in the art will appreciate that all or part of the steps in the methods of the above embodiments may be implemented by associated hardware instructed by a program, which may be stored in a computer-readable storage medium, and the storage medium may include: ROM, RAM, magnetic or optical disks, and the like.
The above-mentioned embodiments, which further illustrate the objects, technical solutions and advantages of the present invention, should be understood that the above-mentioned embodiments are only preferred embodiments of the present invention, and should not be construed as limiting the present invention, and any modifications, equivalents, improvements, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (8)

1. A method for clustering users of a social network, the method comprising:
Step 1: acquiring account information of each user to be clustered in the social network, wherein the account information comprises a user ID, a user area, a user gender, a user fan number, a user attention number, a content tag and release time;
step 2: vectorizing each account information in the social network respectively to form a vector data set;
and step 3: selecting a plurality of vectors from the vector data set as initial clustering centers respectively, clustering users according to the initial clustering centers, updating the clustering centers according to clustered results or clustered results, and continuously clustering the users according to the clustering centers; obtaining a plurality of clustered classifications until the clustering center is not changed any more, wherein each classification at least comprises one user to be clustered;
step 301: selecting k vectors from the vector data set as an initial clustering center by using an improved initial clustering center method;
step 302: dividing all vectors into cluster balls represented by cluster centers closest to the vectors according to a nearest principle;
step 303: calculating the mean value of all vectors in each cluster ball to serve as a new clustering center, and calculating the radius of each cluster ball, wherein the radius is defined as the distance between the data point farthest from the central point in each cluster ball and the central point;
Step 304: finding out the neighbor cluster of each cluster according to the distance relationship between the cluster sphere radius and the cluster center, wherein the neighbor cluster is defined as the neighbor cluster if half of the distance between the center points of the two clusters is smaller than the radius of the current cluster;
step 305: calculating the distance between each vector and the center of the neighboring cluster in which the vector is positioned, filtering out data in the stable domain in each cluster ball, and dividing the data outside the stable domain in each cluster ball into the cluster balls with the nearest distance according to the principle of proximity; the stable domain is defined as a set formed by data points which are close to the clustering center and still belong to the current ball cluster in the iteration;
step 306: and repeating the steps 303 to 305 until the clustering center is not changed any more, and outputting a clustering result.
2. The user clustering method of the social network as claimed in claim 1, wherein the vectorizing of each account information in the social network includes digitizing a part of data of each account information into a digital vector, and converting another part of data into a semantic vector by a steering quantity sen2vec method, including converting words in the semantic information using a pre-trained word vector, weighting all the words, and then processing the whole semantic information set using a principal component analysis method to obtain a vector representation of each semantic information; and splicing the digital vector and the semantic vector to obtain a vector corresponding to the account information of the user to be clustered.
3. The method for clustering users in a social network as claimed in claim 1, wherein said step 301 comprises:
step 3011: randomly selecting one vector data from the vector data set as a first initial clustering center;
step 3012: adopting a Markov chain with the length of 3k from the vector data set by using a Markov model Carlo method, and taking 3k data on the Markov chain as a candidate initial clustering center;
step 3013: and (3) repeatedly combining the two closest initial clustering centers into a new initial clustering center by adopting a minimum spanning tree method of primum for the candidate initial clustering centers in the 3k vector data sets until only k data are left as the initial clustering centers.
4. A user clustering apparatus for a social network, comprising:
the data interface module is used for accessing account information of each user to be clustered in the social network, and the account information comprises a user ID, a user area, a user gender, a user fan number, a user attention number, a content tag and release time;
the data shaping module is used for shaping the account information accessed by the data interface module to form a vector data set;
The clustering result module is used for processing the vectors in the vector data set, selecting a plurality of vectors from the vector data set as initial clustering centers respectively, clustering the users according to the initial clustering centers, updating the clustering centers according to the clustered results or the clustered results, and continuously clustering the users according to the clustering centers; obtaining a plurality of clustered classifications until the clustering center is not changed any more, wherein each classification at least comprises one user to be clustered;
the clustering result module specifically executes the following steps:
step 301: selecting k vectors from the vector data set as an initial clustering center by using an improved initial clustering center method;
step 302: dividing all vectors into cluster balls represented by cluster centers closest to the vectors according to a nearest principle;
step 303: calculating the mean value of all vectors in each cluster ball to serve as a new clustering center, and calculating the radius of each cluster ball, wherein the radius is defined as the distance between the data point farthest from the central point in each cluster ball and the central point;
step 304: finding out the neighbor cluster of each cluster according to the distance relationship between the cluster sphere radius and the cluster center, wherein the neighbor cluster is defined as the neighbor cluster if half of the distance between the center points of the two clusters is smaller than the radius of the current cluster;
Step 305: calculating the distance between each vector and the center of the adjacent cluster, filtering data in a stable domain in each cluster ball, and dividing the data outside the stable domain in each cluster ball into cluster balls with the closest distance according to the principle of proximity; the stable domain is defined as a set formed by data points which are close to the clustering center and still belong to the current cluster ball in the iteration of the current round;
step 306: and repeating the steps 303 to 305 until the clustering center is not changed any more, and outputting a clustering result.
5. The social network user clustering device according to claim 4, wherein the data shaping module comprises a digital vector generation unit and a semantic vector generation unit, the digital vector generation unit is configured to convert a part of data of the account information into a digital vector, and the semantic vector generation module is configured to convert another part of data of the account information into a semantic vector.
6. The user clustering device of the social network according to claim 4, wherein the clustering result module comprises an initial clustering center selecting unit, a calculating unit, and a result calculating unit; the cluster center selection unit is used for selecting an initial cluster center in the vector data set; the calculating unit is used for calculating the radius distance of the cluster balls and the distance relation between any two cluster balls; and the result operation unit is used for dividing a clustering result according to the clustering center.
7. The social network user clustering device according to claim 4, wherein the initial cluster center selecting unit comprises a random selecting subunit, a fixed selecting subunit and a merging unit; the random selection subunit is used for randomly selecting one initial clustering center, the fixed selection subunit is used for adopting a plurality of initial clustering centers from the vector data set according to a Markov model Carlo method, and the merging unit is used for merging the two closest initial clustering centers into a new initial clustering center according to a method of a minimum spanning tree of Primum.
8. A computer device comprising a memory, a processor and a computer program stored on the memory and operable on the processor, wherein the processor when executing the program implements the method of any of claims 1 to 3.
CN201911247467.5A 2019-12-09 2019-12-09 User clustering method and device of social network and computer equipment Active CN111125469B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911247467.5A CN111125469B (en) 2019-12-09 2019-12-09 User clustering method and device of social network and computer equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911247467.5A CN111125469B (en) 2019-12-09 2019-12-09 User clustering method and device of social network and computer equipment

Publications (2)

Publication Number Publication Date
CN111125469A CN111125469A (en) 2020-05-08
CN111125469B true CN111125469B (en) 2022-06-10

Family

ID=70497803

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911247467.5A Active CN111125469B (en) 2019-12-09 2019-12-09 User clustering method and device of social network and computer equipment

Country Status (1)

Country Link
CN (1) CN111125469B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112508512B (en) * 2020-11-26 2022-09-09 国网河北省电力有限公司经济技术研究院 Power grid engineering cost data management method and device and terminal equipment
CN112651487B (en) * 2020-12-21 2021-07-27 广东交通职业技术学院 Data recommendation method, system and medium based on graph collapse convolution neural network
CN112948732B (en) * 2021-01-14 2023-08-22 西安交通大学 Outlier detection method based on normalized minimum spanning tree clustering
CN116362737B (en) * 2023-05-29 2023-10-13 杭州数云信息技术有限公司 Account clustering method and device, computer readable storage medium and terminal
CN117574189B (en) * 2024-01-16 2024-05-03 东北师范大学 Social network user constraint clustering method and system based on Markov random field

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2008022341A2 (en) * 2006-08-18 2008-02-21 The Regents Of The University Of California Seeding method for k-means clustering and other clustering algorithms
CN104850629A (en) * 2015-05-21 2015-08-19 杭州天宽科技有限公司 Analysis method of massive intelligent electricity-consumption data based on improved k-means algorithm
JP2016224915A (en) * 2015-06-03 2016-12-28 富士通株式会社 Data clustering method, information processor, and data clustering program
CN107895303A (en) * 2017-11-15 2018-04-10 电子科技大学 A kind of method of the personalized recommendation based on OCEAN models
CN109858534A (en) * 2019-01-18 2019-06-07 郑州云海信息技术有限公司 A kind of Network anomalous behaviors determine method, apparatus, equipment and readable storage medium storing program for executing
CN109918562A (en) * 2019-01-18 2019-06-21 重庆邮电大学 A kind of recommended method based on communities of users and scoring joint community
CN110378402A (en) * 2019-07-09 2019-10-25 北京工业大学 A kind of K-means clustering method of self study attribute weight

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10162878B2 (en) * 2015-05-21 2018-12-25 Tibco Software Inc. System and method for agglomerative clustering

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2008022341A2 (en) * 2006-08-18 2008-02-21 The Regents Of The University Of California Seeding method for k-means clustering and other clustering algorithms
CN104850629A (en) * 2015-05-21 2015-08-19 杭州天宽科技有限公司 Analysis method of massive intelligent electricity-consumption data based on improved k-means algorithm
JP2016224915A (en) * 2015-06-03 2016-12-28 富士通株式会社 Data clustering method, information processor, and data clustering program
CN107895303A (en) * 2017-11-15 2018-04-10 电子科技大学 A kind of method of the personalized recommendation based on OCEAN models
CN109858534A (en) * 2019-01-18 2019-06-07 郑州云海信息技术有限公司 A kind of Network anomalous behaviors determine method, apparatus, equipment and readable storage medium storing program for executing
CN109918562A (en) * 2019-01-18 2019-06-21 重庆邮电大学 A kind of recommended method based on communities of users and scoring joint community
CN110378402A (en) * 2019-07-09 2019-10-25 北京工业大学 A kind of K-means clustering method of self study attribute weight

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
Peter O. Olukanmi 等.K-means-sharp: Modified centroid update for outlier-robust k-means clustering.《2017 Pattern Recognition Association of South Africa and Robotics and Mechatronics (PRASA-RobMech)》.2018, *
S. Xia 等.Granular ball computing classifiers for efficient, scalable and robust learning.《Informations Sciences》.2019,第483卷 *
高慧颖 等.基于用户聚类与动态交互信任关系的好友推荐方法研究.《数据分析与知识发现》.2019,第3卷(第10期), *
高月 等.基于共享k-近邻与共享逆近邻的密度峰聚类.《计算机工程与应用》.2019,第55卷(第20期), *

Also Published As

Publication number Publication date
CN111125469A (en) 2020-05-08

Similar Documents

Publication Publication Date Title
CN111125469B (en) User clustering method and device of social network and computer equipment
Wang et al. Time series feature learning with labeled and unlabeled data
Lee et al. Self-attention graph pooling
Lai et al. Instance-aware hashing for multi-label image retrieval
Zhu et al. Robust joint graph sparse coding for unsupervised spectral feature selection
Deng et al. A survey on soft subspace clustering
US9053392B2 (en) Generating a hierarchy of visual pattern classes
Huang et al. Object-location-aware hashing for multi-label image retrieval via automatic mask learning
CN113850281B (en) MEANSHIFT optimization-based data processing method and device
CN108595688A (en) Across the media Hash search methods of potential applications based on on-line study
CN110751027B (en) Pedestrian re-identification method based on deep multi-instance learning
CN105631416A (en) Method for carrying out face recognition by using novel density clustering
CN110598022B (en) Image retrieval system and method based on robust deep hash network
Altintakan et al. Towards effective image classification using class-specific codebooks and distinctive local features
CN110738053A (en) News theme recommendation algorithm based on semantic analysis and supervised learning model
Yang et al. A shapelet learning method for time series classification
Li et al. SDE: A novel clustering framework based on sparsity-density entropy
Zhao et al. TUCH: Turning Cross-view Hashing into Single-view Hashing via Generative Adversarial Nets.
Bedalli et al. A heterogeneous cluster ensemble model for improving the stability of fuzzy cluster analysis
Zhang et al. Multiview semantic representation for visual recognition
Zhang et al. Locality-constrained affine subspace coding for image classification and retrieval
Zhang et al. Cosine: compressive network embedding on large-scale information networks
Roy A classification algorithm for high-dimensional data
CN107273842B (en) Selective integrated face recognition method based on CSJOGA algorithm
CN110209895B (en) Vector retrieval method, device and equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant