CN106096052A - A kind of consumer's clustering method towards wechat marketing - Google Patents
A kind of consumer's clustering method towards wechat marketing Download PDFInfo
- Publication number
- CN106096052A CN106096052A CN201610497893.4A CN201610497893A CN106096052A CN 106096052 A CN106096052 A CN 106096052A CN 201610497893 A CN201610497893 A CN 201610497893A CN 106096052 A CN106096052 A CN 106096052A
- Authority
- CN
- China
- Prior art keywords
- cluster
- data
- consumer
- individual
- group
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/02—Marketing; Price estimation or determination; Fundraising
- G06Q30/0201—Market modelling; Market analysis; Collecting market data
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Systems or methods specially adapted for specific business sectors, e.g. utilities or tourism
- G06Q50/01—Social networking
Abstract
The present invention relates to social network data process field, particularly relate to a kind of consumer's clustering processing method towards wechat marketing, the method comprises the following steps: 1) to data complete or collected worksScan for, by data complete or collected works' randomizationSecondary;2) sample data set after each randomization is carried out k means algorithm cluster, it is thus achieved that one group of cluster centre,Sub-sampling, can obtain altogetherGroup cluster centre;3) utilize error sum of squares criterion function, search out one group of cluster centre of optimum, and export;4) Optimal cluster centers searched out with step 3) is as initial cluster center,For input parameter (), data complete or collected works are performed k means algorithm;5) in generationIn group cluster, nearest two groups of combined distance, recalculate the cluster centre after merging;Until clusters number is reduced to, stop merging;Whole algorithm terminates.The method increase speed and the stability of consumer data cluster process.
Description
Technical field
At social network data process field, more particularly to a kind of consumer's cluster marketed towards wechat
Reason method.
Background technology
K-means algorithms as one of the most frequently used Data Clustering Algorithm, its principle be it is pre-set need divide classification
NumberAs input parameter, data set is divided intoIndividual cluster, according to each data object and each cluster cluster centre
Euclidean distance judge which cluster this data object should be integrated into.In same cluster data object have each other compared with
High similarity, and the data object similarity between different clusters is relatively low.K-means algorithms are comprised the concrete steps that:It is first depending on defeated
Enter parameter, in data set random selectionIndividual data object respectively as each cluster cluster centre, calculate it is remaining each
Euclidean distance of the data object to each cluster centre, to allCompare size, data object is referred to minimumInstitute is right
In the cluster answered;Then the cluster centre of each cluster is recalculated, each data object is calculated again to each cluster centre
Euclidean distance, foundationMinimum principle, reclassifies into correspondence cluster to data object, this process is repeated.Finally
Untill the cluster centre of each cluster no longer changes or varied less, iterative calculation terminates, and exports lastIndividual cluster
Cluster, the cluster of complete paired data collection.
Traditional k-means clustering algorithms procedure chart is as shown in Figure 1.
The shortcoming of traditional k-means clustering algorithms:
It is extremely sensitive to initial cluster center.Because k-means algorithms are random selection to the selection of initial cluster center, if
Initial cluster center selection is improper, and algorithm is easily ensnared into locally optimal solution, rather than globally optimal solution.Particularly when data set point
When cloth is uneven, marginal point, extreme point are possible to that initial point can be chosen as, and cause iterative convergence speed slow, Clustering Effect is not
Substantially situations such as.
The content of the invention
In order to solve in the prior art, consumer data treatment effeciency is slow, and local optimum shape easily occurs in cluster process
Condition is so that the problem of the leading to the failure present invention proposes a kind of consumer's clustering processing method marketed towards wechat, and this method is carried
The high speed and stability of consumer data cluster process.
In order to solve the above technical problems, the present invention is achieved through the following technical solutions:
A kind of consumer's clustering processing method marketed towards wechat, the information of this method processing is the consumption gathered from wechat
Person's information, including:Operation note information after the personal information that consumer actively makes a report on, consumer's concern wechat public platform,
Buying behavior information and suggestion feedback information, this method comprise the following steps:
1)To data complete or collected worksScan for, by data complete or collected works' grab sampleIt is secondary;
2)K-means algorithm clusters are carried out to the sample data set after each grab sample, a group cluster center is obtained,It is secondary to take
Sample, can be obtained altogetherGroup cluster center;
3)Using error sum of squares criterion function, an optimal group cluster center is searched out, and export;
4)With step 3)The Optimal cluster centers searched out are initial cluster center,For input parameter(), it is complete to data
Collection performs k-means algorithms;
5)In generationIn group cluster, nearest two groups of combined distance recalculate the cluster centre after merging;Until cluster
Number is reduced to, stop merging;Whole algorithm terminates.
The present invention as a result of above-mentioned technical scheme, compared with prior art it is an advantage of the invention that:
(1)Improve the speed and stability of consumer data cluster process;
(2)Because in cluster process, data subset is multiple(More than 3), Distributed Calculation that can be popular at present.
Brief description of the drawings
Fig. 1 is tradition k-means algorithm flow charts.
Fig. 2 is the k-means algorithm flow charts after present invention improvement.
Fig. 3 samples emulation data set D.
Fig. 4 is two kinds of algorithm performs result figures.
Embodiment
The present invention will be further described with reference to the accompanying drawings and examples:
Present invention design is adapted to the modified k-means algorithms of big data environment, and algorithm flow chart is as shown in Figure 2.At a certain section
Between(One month)Customer data according to profit contribution value be attribute clustered.Had in each class clientIndividual data pair
As, each data object hasThe individual property of value,Represent what customer data was concentrated
TheThe of individual clientIndividual attribute.
The customer data set collected to some phase timeThe purpose clustered is obtained according to customer value tribute
Degree of offering, is obtainedIndividual Customer clustering collection.
Algorithm basic step is described as follows:
To data acquisition systemCarry outSub-sampling, extracts the number of same client object every time, forms data acquisition system vector;
Cluster number is set, and, it is rightIn each sampling set, k-means algorithms are performed, are obtainedGroupIt is individual
Cluster centre;
According to sampling setIn client's number in each cluster set,, calculate sampling setIn it is every
The error sum of squares of individual cluster, calculation formula is as follows:
(1)
WhereinRefer to cluster set value theThe of individual clientIndividual property value,Refer to this cluster centreAttribute's
Value.
Using error sum of squares criterion function, sampling set is calculatedError sum of squares, its calculation formula is such as
Under:
(2)
SelectionThat group cluster center corresponding to minimum value, exported as initial cluster center.
WithAs initial cluster center,To cluster number, to data acquisition systemUniverse perform k-means calculate
Method, is obtainedIndividual cluster。
The middle distance for calculating each two cluster respectively(Euclidean distance between cluster centre)
, its calculation formula is as follows:
(3)
Wherein,Refer toThe cluster centre attribute of individual clusterValue.ChooseTwo minimum Cluster mergings, lay equal stress on
The new cluster centre calculated after merging, until cluster setMiddle clusters number is reduced toWhen, stop closing
And, outputIndividual cluster set.Whole modified k-means algorithms terminate, and obtain according to customer value contribution degree index, logarithm
According to setCluster.
The present invention will carry out emulation experiment with computer software, after contrasting tradition k-means algorithms and improving herein
K-means algorithms Clustering Effect.Realized in this simulation experimental program using Visual C++, computer hardware configuration
For CPU:Inter i5 processors 2.5GHz;Internal memory:4GB.Related data sample parameter sets as shown in table 1:
Table 1
Data sample size | Data attribute is tieed up | Initial clustering number | Cluster numbers | Sample drawn number of times |
5000 | 2 | 12 | 4 | 20 |
The data decimation of this emulation experiment two dimensional sample emulation data set as shown in Figure 3, calculate respectively four in Fig. 4
The vector that refers both to of individual data subset is(0.6509,0.9582)、(3.4821,1.1241)、(3.9587,3.0213)、
(1.7424,4.2508).Tradition k-means algorithms are used first, to raw data setClustered, 30 tradition are performed altogether
K-means algorithms.When performing tradition k-means algorithms every time, upset the order of input data.Similarly, to initial data
CollectionIt is also to upset the order for reading in data when performing the k-means algorithms after 30 improvement.The purpose for the arrangement is that in order to examine
The stability of checking method.
One group of representative cluster result, such as Fig. 4 are respectively selected in the implementing result to two kinds of algorithms(a)、(b)Institute
Show, wherein dot red in figure is represented in each cluster, the position of cluster centre.Use the cluster of traditional k-means algorithms
As a result similar Fig. 4 in(a)Occur in that altogether 23 times.Fig. 4(a)Situation about being reflected is the feelings for being more typically absorbed in Local Minimum
Condition.And can stably obtain similar Fig. 4 after the modified k-means algorithms designed herein are performed(b)Shown cluster knot
Really.
Followed by cluster result, the average value at 30 group cluster centers calculates the cluster centre produced with passing through
Comparative analysis, illustrates the quality of two kinds of algorithms, as shown in table 2:
The cluster centre value of table 2 compares
Cluster numbering | Traditional k-means algorithms produce the average value of cluster centre | Modified k-means algorithms produce the average value of cluster centre | Calculate the cluster centre value produced |
1 | {0.5964,1.1820} | {0.6256,0.9021} | {0.6509,0.9582} |
2 | {3.0425,0.8852} | {3.3584,1.3024} | {3.4821,1.1241} |
3 | {3.3218,3.3204} | {4.1021,3.1541} | {3.9587,3.0213} |
4 | {1.6822,4.3652} | {1.7455,4.2057} | {1.7424,4.2508} |
Pass through the comparison between cluster centre average value in table 2, it can be clearly seen that the cluster that modified k-means algorithms are drawn
Central value is more nearly with the cluster centre value produced after calculating.With reference to Fig. 4(a)、(b), we, which can do, further divides
Analysis:Due to initial data skewness, the shape size of each cluster is also inconsistent, and traditional k-means algorithms random selection is just
The way of beginning cluster centre, it is easy to choose number of edges strong point as initial cluster center.Once such case is run into, tradition
The cluster centre that k-means algorithms are finally drawn very likely is absorbed in locally optimal solution, Fig. 4(a)Reflection is exactly this kind of phenomenon
Typical case.Modified k-means algorithms, 20 equivalent mean samples are first carried out to initial data(Every group of sampling samples are included
250 elements), using error sum of squares criterion function, preferably go out can most reflect data complete or collected works in this 20 groups of sampling samples
Shape, the cluster centre of density feature substitute into computing in k-means algorithms as initial cluster center, and set initial clustering number
Mesh, last agglomerative clustering result.So doing can make cluster result unrelated with reading in data order, be avoided that again poly-
Class is isolated the situation generation to form local excellent solution.
The consumer's clustering method proposed by the present invention marketed towards wechat, stability is stronger, and the degree of accuracy is higher, especially fits
Close the big big data source of processing skewness, data volume.
The specific embodiment of the present invention is the foregoing is only, but the technical characteristic of the present invention is not limited thereto, Ren Heben
The technical staff in field is in the field of the invention, and the change or modification made all are covered among the scope of the claims of the present invention.
Claims (2)
1. a kind of consumer's clustering processing method marketed towards wechat, it is characterised in that the information of this method processing is from micro-
Believe the consumer information of collection, including:Behaviour after the personal information that consumer actively makes a report on, consumer's concern wechat public platform
Note down information, buying behavior information and suggestion feedback information, this method comprises the following steps:
1)To data complete or collected worksScan for, by data complete or collected works' grab sampleIt is secondary;
2)K-means algorithm clusters are carried out to the sample data set after each grab sample, a group cluster center is obtained,It is secondary to take
Sample, can be obtained altogetherGroup cluster center;
3)Using error sum of squares criterion function, an optimal group cluster center is searched out, and export;
4)With step 3)The Optimal cluster centers searched out are initial cluster center,For input parameter(), it is complete to data
Collection performs k-means algorithms;
5)In generationIn group cluster, nearest two groups of combined distance recalculate the cluster centre after merging;Until cluster
Number is reduced to, stop merging;Whole algorithm terminates.
2. a kind of consumer's clustering processing method marketed towards wechat, it is characterised in that the information of this method processing is from micro-
Believe the consumer information of collection, including:Behaviour after the personal information that consumer actively makes a report on, consumer's concern wechat public platform
Note down information, buying behavior information and suggestion feedback information;Had in each class clientIndividual data object, each data object hasThe individual property of value,Represent customer data is concentrated theThe of individual clientIndividual attribute;This method comprises the following steps:
To data acquisition systemCarry outSub-sampling, extracts the number of same client object every time, forms data acquisition system vector;
Cluster number is set, and, it is rightIn each sampling set, k-means algorithms are performed, are obtainedGroupIt is individual
Cluster centre;
According to sampling setIn client's number in each cluster set,, calculate sampling setIn it is every
The error sum of squares of individual cluster, calculation formula is as follows:
WhereinRefer to cluster set value theThe of individual clientIndividual property value,Refer to this cluster centreAttribute's
Value;
Using error sum of squares criterion function, sampling set is calculatedError sum of squares, its calculation formula is as follows:
SelectionThat group cluster center corresponding to minimum value, exported as initial cluster center;
WithAs initial cluster center,To cluster number, to data acquisition systemUniverse perform k-means algorithms, obtainIndividual cluster;
The middle distance for calculating each two cluster respectively(Euclidean distance between cluster centre), its
Calculation formula is as follows:
Wherein,Refer toThe cluster centre attribute of individual clusterValue;ChooseTwo minimum Cluster mergings, lay equal stress on
The new cluster centre calculated after merging, until cluster setMiddle clusters number is reduced toWhen, stop closing
And, outputIndividual cluster set;Whole modified k-means algorithms terminate, and obtain according to customer value contribution degree index, logarithm
According to setCluster.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610497893.4A CN106096052A (en) | 2016-06-25 | 2016-06-25 | A kind of consumer's clustering method towards wechat marketing |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610497893.4A CN106096052A (en) | 2016-06-25 | 2016-06-25 | A kind of consumer's clustering method towards wechat marketing |
Publications (1)
Publication Number | Publication Date |
---|---|
CN106096052A true CN106096052A (en) | 2016-11-09 |
Family
ID=57215243
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610497893.4A Pending CN106096052A (en) | 2016-06-25 | 2016-06-25 | A kind of consumer's clustering method towards wechat marketing |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106096052A (en) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108038709A (en) * | 2017-11-03 | 2018-05-15 | 平安科技(深圳)有限公司 | Client's sampling pilot marketing method, electronic device and computer-readable recording medium |
CN108573266A (en) * | 2017-03-10 | 2018-09-25 | 中国移动通信集团河北有限公司 | The method and apparatus for extracting common trait |
WO2020113363A1 (en) * | 2018-12-03 | 2020-06-11 | Siemens Mobility GmbH | Method and apparatus for classifying data |
CN111488941A (en) * | 2020-04-15 | 2020-08-04 | 烽火通信科技股份有限公司 | Video user grouping method and device based on improved Kmeans algorithm |
CN111527486A (en) * | 2017-12-28 | 2020-08-11 | 东京毅力科创株式会社 | Data processing device, data processing method, and program |
CN113052505A (en) * | 2021-04-30 | 2021-06-29 | 中国银行股份有限公司 | Cross-border travel recommendation method, device and equipment based on artificial intelligence |
-
2016
- 2016-06-25 CN CN201610497893.4A patent/CN106096052A/en active Pending
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108573266A (en) * | 2017-03-10 | 2018-09-25 | 中国移动通信集团河北有限公司 | The method and apparatus for extracting common trait |
CN108038709A (en) * | 2017-11-03 | 2018-05-15 | 平安科技(深圳)有限公司 | Client's sampling pilot marketing method, electronic device and computer-readable recording medium |
CN111527486A (en) * | 2017-12-28 | 2020-08-11 | 东京毅力科创株式会社 | Data processing device, data processing method, and program |
WO2020113363A1 (en) * | 2018-12-03 | 2020-06-11 | Siemens Mobility GmbH | Method and apparatus for classifying data |
CN111488941A (en) * | 2020-04-15 | 2020-08-04 | 烽火通信科技股份有限公司 | Video user grouping method and device based on improved Kmeans algorithm |
CN111488941B (en) * | 2020-04-15 | 2022-05-13 | 烽火通信科技股份有限公司 | Video user grouping method and device based on improved Kmeans algorithm |
CN113052505A (en) * | 2021-04-30 | 2021-06-29 | 中国银行股份有限公司 | Cross-border travel recommendation method, device and equipment based on artificial intelligence |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106096052A (en) | A kind of consumer's clustering method towards wechat marketing | |
Yin et al. | Incomplete multi-view clustering via subspace learning | |
Yu et al. | Self-paced learning for k-means clustering algorithm | |
CN108280491A (en) | A kind of k means clustering methods towards difference secret protection | |
CN109657712A (en) | A kind of electric business food and drink data analysing method based on the improved K-Means algorithm of Spark | |
CN111967520A (en) | Improved SMOTE algorithm-based unbalanced data processing method | |
Dai et al. | Multi-granularity relabeled under-sampling algorithm for imbalanced data | |
CN108549904A (en) | Difference secret protection K-means clustering methods based on silhouette coefficient | |
CN107480685A (en) | A kind of distributed power iteration clustering method and device based on GraphX | |
Thomas et al. | Detecting symmetry in scalar fields using augmented extremum graphs | |
Niu et al. | Stochastic rank aggregation | |
CN106127244A (en) | A kind of parallelization K means improved method and system | |
US20200342204A1 (en) | Instantaneous search and comparison method for large-scale distributed palm vein micro-feature data | |
Saxena et al. | Re-GAN: Data-efficient GANs training via architectural reconfiguration | |
Shao et al. | Labeling malicious communication samples based on semi-supervised deep neural network | |
Zhang et al. | Scalegcn: Efficient and effective graph convolution via channel-wise scale transformation | |
Yu et al. | DBWGIE-MR: A density-based clustering algorithm by using the weighted grid and information entropy based on MapReduce | |
Sun et al. | Clustering with feature order preferences | |
CN109492770A (en) | A kind of net with attributes embedding grammar based on the sequence of personalized relationship | |
Zhang et al. | Projective label propagation by label embedding | |
CN108717551A (en) | A kind of fuzzy hierarchy clustering method based on maximum membership degree | |
Zhang et al. | Self-Adaptive-Means Based on a Covering Algorithm | |
Xia et al. | On the substructure countability of graph neural networks | |
Naitzat et al. | M-Boost: Profiling and refining deep neural networks with topological data analysis | |
CN113360732A (en) | Big data multi-view graph clustering method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20161109 |