CN111125469A - User clustering method and device for social network and computer equipment - Google Patents
User clustering method and device for social network and computer equipment Download PDFInfo
- Publication number
- CN111125469A CN111125469A CN201911247467.5A CN201911247467A CN111125469A CN 111125469 A CN111125469 A CN 111125469A CN 201911247467 A CN201911247467 A CN 201911247467A CN 111125469 A CN111125469 A CN 111125469A
- Authority
- CN
- China
- Prior art keywords
- clustering
- user
- vector
- cluster
- social network
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 50
- 239000013598 vector Substances 0.000 claims abstract description 81
- 238000007493 shaping process Methods 0.000 claims description 9
- 238000012545 processing Methods 0.000 claims description 8
- PPTYJKAXVCCBDU-UHFFFAOYSA-N Rohypnol Chemical compound N=1CC(=O)N(C)C2=CC=C([N+]([O-])=O)C=C2C=1C1=CC=CC=C1F PPTYJKAXVCCBDU-UHFFFAOYSA-N 0.000 claims description 2
- 238000004590 computer program Methods 0.000 claims description 2
- 238000012847 principal component analysis method Methods 0.000 claims description 2
- 238000004422 calculation algorithm Methods 0.000 abstract description 21
- 238000007418 data mining Methods 0.000 abstract description 3
- 238000010801 machine learning Methods 0.000 abstract description 3
- 238000010586 diagram Methods 0.000 description 8
- 238000003064 k means clustering Methods 0.000 description 8
- 238000005070 sampling Methods 0.000 description 5
- 238000013459 approach Methods 0.000 description 3
- 230000006855 networking Effects 0.000 description 3
- 238000007781 pre-processing Methods 0.000 description 3
- 244000097202 Rathbunia alamosensis Species 0.000 description 2
- 235000009776 Rathbunia alamosensis Nutrition 0.000 description 2
- 230000007547 defect Effects 0.000 description 2
- 230000006872 improvement Effects 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 230000001133 acceleration Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 238000004140 cleaning Methods 0.000 description 1
- 238000007621 cluster analysis Methods 0.000 description 1
- 238000007405 data analysis Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 238000010191 image analysis Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000003012 network analysis Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000003909 pattern recognition Methods 0.000 description 1
- 230000008569 process Effects 0.000 description 1
- 238000013215 result calculation Methods 0.000 description 1
- 238000010187 selection method Methods 0.000 description 1
- 238000012163 sequencing technique Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/906—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
- G06F18/232—Non-hierarchical techniques
- G06F18/2321—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
- G06F18/23213—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/01—Social networking
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Business, Economics & Management (AREA)
- Databases & Information Systems (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Health & Medical Sciences (AREA)
- Evolutionary Computation (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Computational Biology (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Computing Systems (AREA)
- Probability & Statistics with Applications (AREA)
- Evolutionary Biology (AREA)
- Economics (AREA)
- General Health & Medical Sciences (AREA)
- Human Resources & Organizations (AREA)
- Marketing (AREA)
- Primary Health Care (AREA)
- Strategic Management (AREA)
- Tourism & Hospitality (AREA)
- General Business, Economics & Management (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention belongs to the field of machine learning and data mining, and particularly relates to a user clustering method and device of a social network and computer equipment; the method comprises the steps of obtaining account information of each user to be clustered in a social network, wherein the account information comprises a user ID, a user area, a user gender, a user fan number, a user attention number, a content tag and release time; vectorizing each account information in the social network respectively to form a vector data set; selecting a plurality of vectors from the vector data set as initial clustering centers respectively, clustering users according to the initial clustering centers, updating the clustering centers according to clustered results or clustered results, and continuously clustering the users according to the clustering centers; until the clustering center is not changed any more, obtaining a plurality of clustered classifications, wherein each classification at least comprises one user to be clustered; the improved initial clustering center selection algorithm and the improved clustering framework can greatly reduce the calculated amount, thereby improving the efficiency of the whole method and the whole device.
Description
Technical Field
The invention belongs to the field of machine learning and data mining, and relates to a fast and efficient k-means clustering algorithm in a clustering problem and application thereof in a social network, in particular to a user clustering method and device of the social network and computer equipment.
Background
Social network analysis is a product of combining human social science and natural science, and researches on social networks comprise networks such as e-mails, WeChat, QQ, Sina microblog, Twitter, Facebook and the like, various objects exist in the social networks, and the objects are required to be classified. The k-means clustering algorithm is one of the most common, simple and effective algorithms in the clustering algorithm. The standard k-means clustering algorithm was independently proposed by Steinhaus in 1955, Lloyd in 1957, Ball & Hall in 1965, and McQueen in 1967 in respective different scientific research fields. Cluster analysis is a technique for statistical data analysis, and is widely used in many fields, including machine learning, data mining, pattern recognition, image analysis, and biological information.
The traditional k-means algorithm performs well in the scene of processing small-batch data, not only in efficiency, but also in clustering effect. However, in the social network, massive user data needs to be processed, so the efficiency of the clustering algorithm is very important. However, in the past, some traditional clustering algorithms are mainly used for technical support, but the traditional clustering algorithms become inefficient in a big data scene, and mainly show that the convergence speed is extremely slow, the time complexity of the algorithms is high, the algorithms are sensitive to noise and outliers, clustering results depend on initial clustering centers, and the like.
Aiming at the problems of the traditional k-means algorithm, the main improvement can be divided into the following three aspects, namely, the selection of an initial clustering center; second, is an approximate k-means; and thirdly, accelerating k-means. David ArthurEtc. propose a method based on sampling (D)2-sampling) initial cluster center point selection method-k-means + +. The core of the method is as follows: the first centroids are spaced as far apart as possible. Although the method improves the defect that the standard k-means algorithm randomly selects the initial central point, the inherent order of the method causes the defect of expandability of the method, so that the algorithm cannot be expanded in parallel and applied to a super-large-scale data set. When clustering is performed on massive data, approximating k-means is a very effective method. In recent years, from different perspectives, researchers have proposed a number of approaches to k-means: storing data points in a k-d tree and maintaining a subset of candidate centers for each node of the tree eliminates computation time by avoiding comparing each point to all center points. Another approach is based on sub-sampling the data points. This method runs the k-means over the sub-sampled data points, and its extension is to add the remaining points incrementally and rerun the k-means to obtain finer clustering. The former approach is not suitable for many applications, such as clustering in social networking applications is less accurate and less performing. There are many types of k-means clustering methods for accurate acceleration, but the problems of extra time and space consumption, non-self-adaption and the like still exist in the context of mass data of a social network.
Disclosure of Invention
The invention aims to solve the efficiency problem caused by clustering mass data in a social network scene, provides a fast and efficient k-means clustering algorithm based on division in a targeted manner, and particularly provides a user clustering method and device of a social network and computer equipment.
A method of clustering users of a social network, the method comprising:
step 1: acquiring account information of each user to be clustered in the social network, wherein the account information comprises a user ID, a user area, a user gender, a user fan number, a user attention number, a content tag and release time;
step 2: vectorizing each account information in the social network respectively to form a vector data set;
and step 3: selecting a plurality of vectors from the vector data set as initial clustering centers respectively, clustering users according to the initial clustering centers, updating the clustering centers according to clustered results or clustered results, and continuously clustering the users according to the clustering centers; and obtaining a plurality of clustered classifications until the clustering center is not changed any more, wherein each classification at least comprises one user to be clustered.
In another aspect, the present invention further provides a user clustering device for a social network, including:
the data interface module is used for accessing account information of each user to be clustered in the social network;
the data shaping module is used for shaping the account information accessed by the data interface module to form a vector data set;
and the clustering result module is used for processing the vectors in the vector data set to obtain a plurality of clustered classifications, and each classification at least comprises one user to be clustered.
The invention has the beneficial effects that:
in the invention, under the scene of processing mass data of the social network, the user account information data of the social network is vectorized through data preprocessing, the final clustering result is more accurate by the improved initial clustering center selection method provided by the invention, and in the improved k-means frame, as the concept of a 'core domain' is defined, and respective neighbor clusters are searched for each cluster ball, all calculated quantities are limited within a small range, thus saving the calculated quantity which is required to be carried out originally and improving the efficiency of clustering the huge mass social network data. The invention can theoretically reduce the time complexity of the algorithm from the original O (nk) of each iteration to O (k)2+ n). For the problem of ultra-large-scale social network user clustering, the method and the device can greatly reduce the calculated amount, and further improve the efficiency of the whole method and device.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are only a simple schematic diagram of the present invention.
FIG. 1 is an overall flow diagram of one embodiment of the present invention;
FIG. 2 is an overall flow diagram of another embodiment of the present invention;
FIG. 3 is a flow diagram of an extraction and processing of social networking data;
FIG. 4 is a detailed flow diagram of an improved k-means clustering framework in an embodiment of the present invention;
FIG. 5 is a detailed flow chart of an improved k-means clustering framework in another embodiment of the present invention;
FIG. 6 is a schematic diagram of a user clustering device of a social network according to the present invention;
FIG. 7 is a schematic diagram of a data shaping module of the present invention;
FIG. 8 is a schematic diagram of a clustering results module of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more clearly and completely apparent, the technical solutions in the embodiments of the present invention are described below with reference to the accompanying drawings, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments.
As shown in fig. 1, a method for clustering users in a social network includes:
step 1: acquiring account information of each user to be clustered in the social network, wherein the account information comprises a user ID, a user area, a user gender, a user fan number, a user attention number, a content tag and release time;
step 2: vectorizing each account information in the social network respectively to form a vector data set;
and step 3: selecting a plurality of vectors from the vector data set as initial clustering centers respectively, clustering users according to the initial clustering centers, updating the clustering centers according to clustered results or clustered results, and continuously clustering the users according to the clustering centers; and obtaining a plurality of clustered classifications until the clustering center is not changed any more, wherein each classification at least comprises one user to be clustered.
In another embodiment, as shown in fig. 2, a method for clustering users of a social network may further include:
acquiring user account information in a social network (such as a Sina microblog), wherein the user account information comprises information such as a user ID, a user region, a user gender, a user fan number, a user attention number, a microblog content tag, release time and the like;
preprocessing user information and then vectorizing the preprocessed user information;
selecting an initial clustering center by using an improved method;
inputting all processed data sets into an improved k-means algorithm frame, and continuously iterating until the algorithm converges;
and outputting an accurate clustering result.
In one embodiment, as shown in FIG. 3, there is a flow chart for extracting and processing social networking data, the method comprises the following steps:
carrying out data cleaning on social network data, and preprocessing the characteristic data of a user;
converting the format of the data, wherein a part of data is processed by adopting a digital vector, and the other part of data is processed by adopting a semantic vector; for example, the ID of the user is normalized to limit the range between 0 and 1, the gender of the user (for example, male 1 represents and female 0 represents), the region of the user is represented by numbers (for example, Beijing 001, Shanghai 002, Chongqing 003 and the like), the semantic information is converted into a semantic vector by a sentence turning quantity sen2vec method, the semantic vector is converted into words in the semantic information by using a pre-trained word vector, all the words are weighted, and then the whole semantic information set is processed by using a principal component analysis method to obtain the vector representation of each piece of semantic information. And vectorizing the user data information on the basis of the steps.
In one embodiment, as shown in FIG. 4, it is a detailed flow chart of the improved k-means clustering framework, which includes the detailed process of the whole clustering. The method comprises the following steps:
step 301: selecting k vectors from the vector data set as an initial clustering center by using an improved initial clustering center method;
step 302: dividing all vectors into cluster balls represented by cluster centers closest to the vectors according to a nearest principle;
step 303: calculating the mean value of all vectors in each cluster ball to serve as a new clustering center, and calculating the radius of the cluster ball;
step 304: finding out the neighbor cluster of each cluster according to the distance relationship between the cluster sphere radius and the cluster center;
step 305: calculating the distance between each vector and the center of the adjacent cluster where the vector is located, and dividing the vector into cluster balls with the nearest distance according to the principle of proximity;
step 306: and repeating the steps 303 to 305 until the clustering center is not changed any more, and outputting a clustering result.
In one embodiment, the step 301 comprises:
step 3011: randomly selecting one vector data from the vector data set as a first initial clustering center;
step 3012: adopting a Markov chain with the length of 3k from the vector data set by using a Markov model Carlo method, and taking 3k data on the Markov chain as a candidate initial clustering center;
step 3013: and (3) repeatedly combining the two closest initial clustering centers into a new initial clustering center by adopting a minimum spanning tree method of Primem for the candidate initial clustering centers in the 3k vector data sets until only k data are left as the initial clustering centers.
Of course, the data acquired by the markov model carlo method may be 4k pieces of vector data, or may be 4.5k pieces of vector data, and so on.
In one embodiment, the step 301 may further include:
randomly selecting one vector data from the vector data set in a reservoir sampling (reservoir sampling) mode as a first initial clustering center; and placing the first k vectors in the vector data set into the reservoir, replacing one vector in the reservoir with the probability of k/m for the mth element, and taking the finally selected k vectors as an initial clustering center.
In another embodiment, as shown in fig. 5, the step 3 further includes:
step 311: first, a first iteration is carried out on a data set by using a standard k-means flow, and all data are distributed to a cluster where a central point closest to the data is located:
b(xi)=argminj=1...k{(dis(xi,cj)}
cjrepresenting the center of the cluster, xiIs any one sample in the sample space.
Step 312: updating all cluster centers according to the above allocation steps:
| N | indicates that C is assignedjThe data sample of (1).
Step 313: calculating the radius of each cluster sphere (radius is defined as the distance between the data point farthest from the center point and the center point in the cluster sphere) defined as:
Ri=max(dis(xi,ci))for{xi|xi∈Cj}。
step 314: finding out a neighbor cluster of each cluster according to the radius information of each cluster ball and the distance relationship between any two cluster balls (the neighbor cluster is defined as a neighbor cluster if half of the distance between the central points of two clusters is smaller than the radius of the current cluster);
step 315: sequencing the adjacent clusters of each cluster from near to far;
step 316: filtering out data in a "stable domain" in each cluster sphere (a "stable domain" is defined as a set of data points that are close to the cluster center and still belong to the current cluster sphere in the current iteration);
step 317: performing nearest principle distribution on data outside the 'stable domain' in each cluster ball, wherein the range available for distribution is that each cluster ball is a neighbor cluster according to the sorting in the step 315;
step 318: steps 312 to 317 are repeated until all center points are no longer changed.
In addition, in a specific embodiment, the vector data set of the present invention uses the user data crawled from Twitter and the user data on the xinlang microblog, which respectively include 376 ten thousand pieces of user data and 620 ten thousand pieces of user data. After all social network user data are preprocessed, the two data sets are clustered under the framework of the method provided by the invention, the result display is 51 times faster than that of the traditional k-means algorithm on average, and the clustering result (the evaluation index is WCSSD, namely the square sum in a cluster is better, and the smaller the square sum in the cluster is), is reduced by 3.1% compared with that of the traditional social network user clustering algorithm.
Based on the same concept of the present invention, the present invention further provides a user clustering device for a social network, as shown in fig. 6, including:
the data interface module is used for accessing account information of each user to be clustered in the social network;
the data shaping module is used for shaping the account information accessed by the data interface module to form a vector data set;
and the clustering result module is used for processing the vectors in the vector data set to obtain a plurality of clustered classifications, and each classification at least comprises one user to be clustered.
As shown in fig. 7, the data shaping module includes a digital vector generation unit for converting a part of data of the account information into a digital vector and a semantic vector generation unit for converting another part of data of the account information into a semantic vector.
As shown in fig. 8, the clustering result module includes an initial clustering center selection unit, a calculation unit, and a result calculation unit; the cluster center selection unit is used for selecting an initial cluster center in the vector data set; the calculating unit is used for calculating the radius distance of the cluster balls and the distance relation between any two cluster balls; and the result operation unit is used for dividing a clustering result according to the clustering center.
The initial clustering center selecting unit comprises a random selecting subunit, a fixed selecting subunit and a merging unit; the random selection subunit is used for randomly selecting one initial clustering center, the fixed selection subunit is used for adopting a plurality of initial clustering centers from the vector data set according to a Markov model Carlo method, and the merging unit is used for merging the two closest initial clustering centers into a new initial clustering center according to a method of a minimum spanning tree of Primum.
In addition, the invention also provides a computer device, which comprises a memory, a processor and a computer program stored on the memory and capable of being used on the processor, wherein the processor executes the program to realize the user clustering method provided by the invention.
In one embodiment of the invention, the invention employs a Python programming language and can operate on mainstream computer platforms. The operating system used in the implementation is CentOS 6.5, the CPU is required to be Intel i5, the memory is more than 8GB, and the hard disk space is required to be more than 32 GB.
It is understood that some features of the method, apparatus and computer device of the present invention may be mutually cited, and the present invention is not described in detail.
Those skilled in the art will appreciate that all or part of the steps in the methods of the above embodiments may be implemented by associated hardware instructed by a program, which may be stored in a computer-readable storage medium, and the storage medium may include: ROM, RAM, magnetic or optical disks, and the like.
The above-mentioned embodiments, which further illustrate the objects, technical solutions and advantages of the present invention, should be understood that the above-mentioned embodiments are only preferred embodiments of the present invention, and should not be construed as limiting the present invention, and any modifications, equivalents, improvements, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.
Claims (10)
1. A method for clustering users of a social network, the method comprising:
step 1: acquiring account information of each user to be clustered in the social network, wherein the account information comprises a user ID, a user area, a user gender, a user fan number, a user attention number, a content tag and release time;
step 2: vectorizing each account information in the social network respectively to form a vector data set;
and step 3: selecting a plurality of vectors from the vector data set as initial clustering centers respectively, clustering users according to the initial clustering centers, updating the clustering centers according to clustered results or clustered results, and continuously clustering the users according to the clustering centers; and obtaining a plurality of clustered classifications until the clustering center is not changed any more, wherein each classification at least comprises one user to be clustered.
2. The user clustering method of the social network as claimed in claim 1, wherein the vectorizing of each account information in the social network includes digitizing a part of data of each account information into a digital vector, and converting another part of data into a semantic vector by a steering quantity sen2vec method, including converting words in the semantic information using a pre-trained word vector, weighting all the words, and then processing the whole semantic information set using a principal component analysis method to obtain a vector representation of each semantic information; and splicing the digital vector and the semantic vector to obtain a vector corresponding to the account information of the user to be clustered.
3. The method for clustering users in a social network according to claim 1, wherein the step 3 comprises the steps of:
step 301: selecting k vectors from the vector data set as an initial clustering center by using an improved initial clustering center method;
step 302: dividing all vectors into cluster balls represented by cluster centers closest to the vectors according to a nearest principle;
step 303: calculating the mean value of all vectors in each cluster ball to serve as a new clustering center, and calculating the radius of the cluster ball;
step 304: finding out the neighbor cluster of each cluster according to the distance relationship between the cluster sphere radius and the cluster center;
step 305: calculating the distance between each vector and the center of the adjacent cluster where the vector is located, and dividing the vector into cluster balls with the nearest distance according to the principle of proximity;
step 306: and repeating the steps 303 to 305 until the clustering center is not changed any more, and outputting a clustering result.
4. The method for clustering users in a social network according to claim 3, wherein the step 301 comprises:
step 3011: randomly selecting one vector data from the vector data set as a first initial clustering center;
step 3012: adopting a Markov chain with the length of 3k from the vector data set by using a Markov model Carlo method, and taking 3k data on the Markov chain as a candidate initial clustering center;
step 3013: and (3) repeatedly combining the two closest initial clustering centers into a new initial clustering center by adopting a minimum spanning tree method of Primem for the candidate initial clustering centers in the 3k vector data sets until only k data are left as the initial clustering centers.
5. The method of claim 3, wherein the finding the neighbor cluster of each cluster comprises finding the neighbor cluster if a half of a distance between center points of two clusters is smaller than a radius of the current cluster.
6. A user clustering apparatus for a social network, comprising:
the data interface module is used for accessing account information of each user to be clustered in the social network;
the data shaping module is used for shaping the account information accessed by the data interface module to form a vector data set;
and the clustering result module is used for processing the vectors in the vector data set to obtain a plurality of clustered classifications, and each classification at least comprises one user to be clustered.
7. The social network user clustering device according to claim 6, wherein the data shaping module comprises a digital vector generation unit and a semantic vector generation unit, the digital vector generation unit is configured to convert a part of data of the account information into a digital vector, and the semantic vector generation module is configured to convert another part of data of the account information into a semantic vector.
8. The user clustering device of the social network according to claim 6, wherein the clustering result module comprises an initial clustering center selecting unit, a calculating unit, and a result calculating unit; the cluster center selection unit is used for selecting an initial cluster center in the vector data set; the calculating unit is used for calculating the radius distance of the cluster balls and the distance relation between any two cluster balls; and the result operation unit is used for dividing a clustering result according to the clustering center.
9. The social network user clustering device according to claim 6, wherein the initial cluster center selecting unit comprises a random selecting subunit, a fixed selecting subunit and a merging unit; the random selection subunit is used for randomly selecting one initial clustering center, the fixed selection subunit is used for adopting a plurality of initial clustering centers from the vector data set according to a Markov model Carlo method, and the merging unit is used for merging the two closest initial clustering centers into a new initial clustering center according to a method of a minimum spanning tree of Primum.
10. A computer device comprising a memory, a processor and a computer program stored on the memory and operable on the processor, wherein the processor when executing the program implements the method of any of claims 1 to 5.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911247467.5A CN111125469B (en) | 2019-12-09 | 2019-12-09 | User clustering method and device of social network and computer equipment |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911247467.5A CN111125469B (en) | 2019-12-09 | 2019-12-09 | User clustering method and device of social network and computer equipment |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111125469A true CN111125469A (en) | 2020-05-08 |
CN111125469B CN111125469B (en) | 2022-06-10 |
Family
ID=70497803
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201911247467.5A Active CN111125469B (en) | 2019-12-09 | 2019-12-09 | User clustering method and device of social network and computer equipment |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111125469B (en) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112508512A (en) * | 2020-11-26 | 2021-03-16 | 国网河北省电力有限公司经济技术研究院 | Power grid engineering cost data management method and device and terminal equipment |
CN112651487A (en) * | 2020-12-21 | 2021-04-13 | 广东交通职业技术学院 | Data recommendation method, system and medium based on graph collapse convolution neural network |
CN112948732A (en) * | 2021-01-14 | 2021-06-11 | 西安交通大学 | Outlier detection method based on normalized minimum spanning tree clustering |
CN115130123A (en) * | 2022-06-24 | 2022-09-30 | 苏州浪潮智能科技有限公司 | Multi-party privacy cooperation-based federated learning method and system for k-means clustering |
CN116362737A (en) * | 2023-05-29 | 2023-06-30 | 杭州数云信息技术有限公司 | Account clustering method and device, computer readable storage medium and terminal |
CN117574189A (en) * | 2024-01-16 | 2024-02-20 | 东北师范大学 | Social network user constraint clustering method and system based on Markov random field |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2008022341A2 (en) * | 2006-08-18 | 2008-02-21 | The Regents Of The University Of California | Seeding method for k-means clustering and other clustering algorithms |
CN104850629A (en) * | 2015-05-21 | 2015-08-19 | 杭州天宽科技有限公司 | Analysis method of massive intelligent electricity-consumption data based on improved k-means algorithm |
US20160342677A1 (en) * | 2015-05-21 | 2016-11-24 | Dell Products, Lp | System and Method for Agglomerative Clustering |
JP2016224915A (en) * | 2015-06-03 | 2016-12-28 | 富士通株式会社 | Data clustering method, information processor, and data clustering program |
CN107895303A (en) * | 2017-11-15 | 2018-04-10 | 电子科技大学 | A kind of method of the personalized recommendation based on OCEAN models |
CN109858534A (en) * | 2019-01-18 | 2019-06-07 | 郑州云海信息技术有限公司 | A kind of Network anomalous behaviors determine method, apparatus, equipment and readable storage medium storing program for executing |
CN109918562A (en) * | 2019-01-18 | 2019-06-21 | 重庆邮电大学 | A kind of recommended method based on communities of users and scoring joint community |
CN110378402A (en) * | 2019-07-09 | 2019-10-25 | 北京工业大学 | A kind of K-means clustering method of self study attribute weight |
-
2019
- 2019-12-09 CN CN201911247467.5A patent/CN111125469B/en active Active
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2008022341A2 (en) * | 2006-08-18 | 2008-02-21 | The Regents Of The University Of California | Seeding method for k-means clustering and other clustering algorithms |
CN104850629A (en) * | 2015-05-21 | 2015-08-19 | 杭州天宽科技有限公司 | Analysis method of massive intelligent electricity-consumption data based on improved k-means algorithm |
US20160342677A1 (en) * | 2015-05-21 | 2016-11-24 | Dell Products, Lp | System and Method for Agglomerative Clustering |
JP2016224915A (en) * | 2015-06-03 | 2016-12-28 | 富士通株式会社 | Data clustering method, information processor, and data clustering program |
CN107895303A (en) * | 2017-11-15 | 2018-04-10 | 电子科技大学 | A kind of method of the personalized recommendation based on OCEAN models |
CN109858534A (en) * | 2019-01-18 | 2019-06-07 | 郑州云海信息技术有限公司 | A kind of Network anomalous behaviors determine method, apparatus, equipment and readable storage medium storing program for executing |
CN109918562A (en) * | 2019-01-18 | 2019-06-21 | 重庆邮电大学 | A kind of recommended method based on communities of users and scoring joint community |
CN110378402A (en) * | 2019-07-09 | 2019-10-25 | 北京工业大学 | A kind of K-means clustering method of self study attribute weight |
Non-Patent Citations (4)
Title |
---|
PETER O. OLUKANMI 等: "K-means-sharp: Modified centroid update for outlier-robust k-means clustering", 《2017 PATTERN RECOGNITION ASSOCIATION OF SOUTH AFRICA AND ROBOTICS AND MECHATRONICS (PRASA-ROBMECH)》, 18 January 2018 (2018-01-18), pages 14 - 19 * |
S. XIA 等: "Granular ball computing classifiers for efficient, scalable and robust learning", 《INFORMATIONS SCIENCES》, vol. 483, 31 May 2019 (2019-05-31), pages 136 - 152 * |
高慧颖 等: "基于用户聚类与动态交互信任关系的好友推荐方法研究", 《数据分析与知识发现》, vol. 3, no. 10, 25 October 2019 (2019-10-25), pages 66 - 77 * |
高月 等: "基于共享k-近邻与共享逆近邻的密度峰聚类", 《计算机工程与应用》, vol. 55, no. 20, 31 May 2019 (2019-05-31), pages 43 - 51 * |
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112508512A (en) * | 2020-11-26 | 2021-03-16 | 国网河北省电力有限公司经济技术研究院 | Power grid engineering cost data management method and device and terminal equipment |
CN112651487A (en) * | 2020-12-21 | 2021-04-13 | 广东交通职业技术学院 | Data recommendation method, system and medium based on graph collapse convolution neural network |
CN112651487B (en) * | 2020-12-21 | 2021-07-27 | 广东交通职业技术学院 | Data recommendation method, system and medium based on graph collapse convolution neural network |
CN112948732A (en) * | 2021-01-14 | 2021-06-11 | 西安交通大学 | Outlier detection method based on normalized minimum spanning tree clustering |
CN112948732B (en) * | 2021-01-14 | 2023-08-22 | 西安交通大学 | Outlier detection method based on normalized minimum spanning tree clustering |
CN115130123A (en) * | 2022-06-24 | 2022-09-30 | 苏州浪潮智能科技有限公司 | Multi-party privacy cooperation-based federated learning method and system for k-means clustering |
CN116362737A (en) * | 2023-05-29 | 2023-06-30 | 杭州数云信息技术有限公司 | Account clustering method and device, computer readable storage medium and terminal |
CN116362737B (en) * | 2023-05-29 | 2023-10-13 | 杭州数云信息技术有限公司 | Account clustering method and device, computer readable storage medium and terminal |
CN117574189A (en) * | 2024-01-16 | 2024-02-20 | 东北师范大学 | Social network user constraint clustering method and system based on Markov random field |
CN117574189B (en) * | 2024-01-16 | 2024-05-03 | 东北师范大学 | Social network user constraint clustering method and system based on Markov random field |
Also Published As
Publication number | Publication date |
---|---|
CN111125469B (en) | 2022-06-10 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111125469B (en) | User clustering method and device of social network and computer equipment | |
Wang et al. | Time series feature learning with labeled and unlabeled data | |
Zhu et al. | Exploring auxiliary context: discrete semantic transfer hashing for scalable image retrieval | |
US9053392B2 (en) | Generating a hierarchy of visual pattern classes | |
CN110046634B (en) | Interpretation method and device of clustering result | |
CN107066555B (en) | On-line theme detection method for professional field | |
CN113850281B (en) | MEANSHIFT optimization-based data processing method and device | |
Huang et al. | Object-location-aware hashing for multi-label image retrieval via automatic mask learning | |
CN111898703B (en) | Multi-label video classification method, model training method, device and medium | |
CN109508379A (en) | A kind of short text clustering method indicating and combine similarity based on weighted words vector | |
CN108280236B (en) | Method for analyzing random forest visual data based on LargeVis | |
CN108197144B (en) | Hot topic discovery method based on BTM and Single-pass | |
CN114283350B (en) | Visual model training and video processing method, device, equipment and storage medium | |
CN110751027B (en) | Pedestrian re-identification method based on deep multi-instance learning | |
CN105631416A (en) | Method for carrying out face recognition by using novel density clustering | |
CN110598022B (en) | Image retrieval system and method based on robust deep hash network | |
Yang et al. | A shapelet learning method for time series classification | |
CN108427745A (en) | The image search method of visual dictionary and adaptive soft distribution based on optimization | |
Altintakan et al. | Towards effective image classification using class-specific codebooks and distinctive local features | |
CN109522432B (en) | Image retrieval method integrating adaptive similarity and Bayes framework | |
CN113010705A (en) | Label prediction method, device, equipment and storage medium | |
CN112668482A (en) | Face recognition training method and device, computer equipment and storage medium | |
CN115982144A (en) | Similar text duplicate removal method and device, storage medium and electronic device | |
Zhang et al. | Locality-constrained affine subspace coding for image classification and retrieval | |
Zhang et al. | Dataset-driven unsupervised object discovery for region-based instance image retrieval |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |