CN109002858A - A kind of clustering ensemble method based on evidential reasoning for user behavior analysis - Google Patents
A kind of clustering ensemble method based on evidential reasoning for user behavior analysis Download PDFInfo
- Publication number
- CN109002858A CN109002858A CN201810814178.8A CN201810814178A CN109002858A CN 109002858 A CN109002858 A CN 109002858A CN 201810814178 A CN201810814178 A CN 201810814178A CN 109002858 A CN109002858 A CN 109002858A
- Authority
- CN
- China
- Prior art keywords
- formula
- cluster
- clustering
- user behavior
- sample
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N5/00—Computing arrangements using knowledge-based models
- G06N5/04—Inference or reasoning models
Landscapes
- Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Theoretical Computer Science (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Computational Biology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Computational Linguistics (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The present invention provides a kind of clustering ensemble method based on evidential reasoning for user behavior analysis, it can fully consider the time response of user data and the credibility of base cluster device, by using the method for evidential reasoning it is comprehensive solve the problems, such as individually to cluster the not strong and existing clustering ensemble method applicability of device robustness and stability it is poor, to improve the Clustering Effect of user behavior data.The beneficial effects of the present invention are: it can overcome the problems, such as that user behavior data fails because of traditional clustering algorithm brought by higher-dimension;It can integrate that solve the problems, such as individually to cluster the not strong and existing clustering ensemble method applicability of device robustness and stability poor, to improve the Clustering Effect of user behavior data;The present invention can be used for the cluster of user behavior data, particular with the user behavior data clustering problem of high dimensional feature, can be also used for the cluster etc. of flow data, has wide range of applications.
Description
Technical field
The present invention relates to clustering method technical field more particularly to it is a kind of for user behavior analysis based on evidential reasoning
Clustering ensemble method.
Background technique
Currently used clustering method has five classes, including the clustering method based on division, the clustering method based on level, base
In the clustering method of level, density clustering method and based on the clustering method of grid.Based on division based on division
Clustering method represents method such as-mean value (k-means) clustering method, its thought is can by the object nearest apart from cluster center
To be divided into a cluster;Clustering method thought based on level is carried out by creating hierachical decomposition for data-oriented object set
The method of cluster;Density clustering method, represents method such as DBSCAN algorithm, which assumes that cluster structure can pass through sample
The tightness degree of this distribution determines;Clustering method based on model such as EM algorithm, can be used for containing hidden variable
(latentvariable) maximal possibility estimation or maximum a posteriori estimate of probability parameter model;Cluster based on grid
The thought of method is that object space is quantified as to a limited number of unit, forms a reticular structure, all clusters are all at this
It is carried out on reticular structure.
In general, these single clustering methods can by the analysis of excavation and behavioral trait to user behavior data,
Effectively identification user behavior pattern, evaluation requirement respond potentiality, to provide decision-making foundation for the formulation of marketing program.However,
With the continuous renewal of user behavior data, the rapid development of data volume, data acquire user, and there is extremely strong dispersibility to wait one
The appearance of series challenge, existing method is due to being highly susceptible to data variation using its stability of single Clustering Model and accuracy
Influence, generalization ability and adaptability is not strong, can not the electricity consumption behavior to different type user carry out it is deep, quick, accurate
Analysis.Basic reason is the inherent ambiguity of natural grouping concept in data set.Another where the shoe pinches is clustering cluster
Diversity, clustering cluster can have a different shapes, different density, different sizes, and they are often overlapped.By
Often there are various problems in single clustering algorithm, occurs the research of many clustering ensemble algorithms in recent years.Clustering ensemble
Thought seeks to generate a cluster collective, that is, has that multiple cluster results are available, the result clustered then in conjunction with these with
It asks to obtain one and more preferably cluster.The problem of being clustered in conjunction with member in cluster collective also referred to as compatibility function problem, Huo Chengwei
Integration problem.Existing clustering ensemble method includes method based on Co-Occurrence and based on MedianPartition's
Method.Clustering ensemble method based on Co-Occurrence beats again label and voting method, assists matrix method and drawing method altogether
Deng;Clustering ensemble method based on MedianPartition has genetic algorithm, Non-negative Matrix Factorization and kernel method etc..In recent years,
The attention of many researchers has been obtained about the research of clustering ensemble, and evidential reasoning melts as a kind of effective information
Conjunction method has been applied to many fields, however there has been no evidential reasoning rule is dissolved into showing during clustering ensemble at present
There is technology.
Summary of the invention
In order to solve above-mentioned technological deficiency existing in the prior art, the present invention provides a kind of for user behavior analysis
Clustering ensemble method based on evidential reasoning can fully consider the time response of user data and the credible journey of base cluster device
Degree solves single cluster device robustness and the not strong and existing clustering ensemble side of stability by using the method for evidential reasoning is comprehensive
The poor problem of method adaptability, to improve the Clustering Effect of user behavior data.
The present invention is achieved by the following technical solutions:
A kind of clustering ensemble method based on evidential reasoning for user behavior analysis, suitable for having time response
Flow data set;The clustering ensemble method includes the following steps:
Step 1, for the user behavior data collection { D of different periods1,D2,...,Dk,...,DK, utilize different parameters
FCM Algorithms generate K subordinated-degree matrix { U respectively1,U2,...,Uk,...,UK};Wherein, DkIndicate k-th period
Data, UkIndicate k-th of subordinated-degree matrix;The user behavior data collection be will with time response original stream data on time
Between window cutting obtain data set;
Step 2, K subordinated-degree matrix { U step 1 obtained1,U2,...,Uk,...,UKTo be converted to K similar
Matrix { SM1,SM2,...,SMk,...,SMK, and similarity vector { SV is converted by the K similar matrix1,SV2,...,
SVk,...,SVK, and be normalized;Wherein, similarity vector is by SV=Ω={ H1,H2,...,Hm,...,HMIndicate;
Step 3, the power set of Ω is enabled to be indicated by formula (7):
Then according to evidential reasoning rule, by the similarity vector { SV1,SV2,...,SVKCan be closed by iterative algorithm
Integrated similarity vector SV after and*=E (K)={ H1,H2,...,Hm,...,HM, and pH,E(K)Evidence E (K) is expressed as to the letter of H
Degree;
Step 4, the integrated similarity vector SV based on evidential reasoning*, determined using the AGNES algorithm in hierarchy clustering method
Final clustering ensemble result { C1,C2,...,Ct,...,CT, wherein CtFor clustering cluster, T is final cluster number.
The beneficial effect of the present invention compared with the existing technology is:
First, the present invention by user behavior data temporally span carry out cutting, using FCM Algorithms to it is different when
Between the user behavior data of section clustered, and clustering ensemble is carried out by the method based on evidential reasoning, user can be overcome
The problem of behavioral data fails because of traditional clustering algorithm brought by higher-dimension.
Second, the present invention can fully consider the time response of user data and the credibility of base cluster device, by adopting
Single cluster device robustness and the not strong and existing clustering ensemble method applicability of stability are solved with the method for evidential reasoning is comprehensive
Poor problem, to improve the Clustering Effect of user behavior data.
Third, method proposed by the invention can be used for the cluster of user behavior data, particular with high dimensional feature
User behavior data clustering problem can be also used for the cluster etc. of flow data, have wide range of applications.
Detailed description of the invention
Fig. 1 is the general flow chart of the clustering ensemble method based on evidential reasoning for user behavior analysis.
Fig. 2 is the analysis result figure of error sum of squares SSE value.
Fig. 3 is the analysis result figure of index of conformity C-index value.
Fig. 4 is the analysis result figure of silhouette coefficient SC value.
Specific embodiment
In order to make the objectives, technical solutions, and advantages of the present invention clearer, with reference to the accompanying drawings and embodiments, right
The present invention is further elaborated.It should be appreciated that the specific embodiments described herein are only used to explain the present invention,
It is not intended to limit the present invention.
Embodiment 1:
As shown in Figure 1, a kind of clustering ensemble method based on evidential reasoning for user behavior analysis, suitable for having
The flow data set of time response, the clustering ensemble method include the following steps:
Step 1, for the user behavior data collection of different periods, according to data the feature of itself per year, the moon or day be
Time window cutting is { D1,D2,...,Dk,...,DK, K degree of membership is generated respectively using the FCM Algorithms of different parameters
Matrix { U1,U2,...,Uk,...,UK};Wherein, DkIndicate the data of k-th of period, UkIndicate k-th of subordinated-degree matrix.User
Behavioral data collection is that temporally window cutting obtains by initial data, (such as 7 years user power utilization numbers used in experiment
According to if time window is set to year, by the initial data panel data that cutting is seven per year).
Specifically, step 1 further comprises following steps:
Step 1.1, the random number with value in (0,1) section initializes Subject Matrix U, and the Subject Matrix U is full
The constraint of sufficient formula (1):
In formula (1), uijIndicate that j-th of sample point belongs to the general of ith cluster center in the subordinated-degree matrix U
Rate;C indicates the cluster number of k-th of FCM Algorithms.
Step 1.2, the objective function of formula (2) construction FCM Algorithms is utilized:
In formula (2),Middle m indicates degree of membership uijCoefficient, general value be 2;It indicates i-th
Cluster centre ciWith j-th of data point xjBetween Euclidean distance;Given threshold value δ or maximum number of iterations Max_
Iteration is less than threshold value δ if formula (2) or is directly entered step 1.5 if reaching maximum number of iterations, otherwise enters step
1.3。
Step 1.3, cluster centre c is updated using formula (3) and formula (4)iWith the element u in subordinated-degree matrix Uij:
Step 1.4, revolution executes step 1.2, by the updated cluster centre c of step 1.3iWith element uijBring formula into
(2)。
Step 1.5, revolution executes step 1.1, and step 1.1 to step 1.4 is repeated K times, K subordinated-degree matrix is obtained
{U1,U2,...,Uk,...,UK}。
Step 2, K subordinated-degree matrix { U step 1 obtained1,U2,...,Uk,...,UKTo be converted to K similar
Matrix { SM1,SM2,...,SMk,...,SMK, and similarity vector { SV is converted by the K similar matrix1,SV2,...,
SVk,...,SVK, and be normalized;Wherein, similarity vector is by SV=Ω={ H1,H2,...,Hm,...,HMIndicate.
Specifically, step 2 further comprises following steps:
Step 2.1, the subordinated-degree matrix U obtained based on step 1k, calculated according to formula (5) and obtain kth cluster result
Similar matrix SMk:
SMk=Uk(Uk)T (5)
In formula (5), similar matrix SMkIn elementIndicate the sample x in kth cluster resultiWith sample xjCome
From the joint degree of membership of the same cluster centre.
Step 2.2, enable similarity vector by SV=Ω={ H1,H2,...,Hm,...,HMIndicate, and the value of similarity vector by
The element of similar matrix SM diagonal line above section is constituted, the number of element
Step 2.3, using formula (6) to the similarity vector SVkIn element be normalized, can be obtained:
In the formula (6),For similarity vector SVkIn m-th of element, element number is sharedIt is a,To be the sum of all, pX,kFor the element after normalizationValue.
Step 3, the power set of Ω is enabled to be indicated by formula (7):
Then according to evidential reasoning rule, by the similarity vector { SV1,SV2,...,SVKCan be closed by iterative algorithm
Integrated similarity vector SV after and*=E (K)={ H1,H2,...,Hm,...,HM, and pH,E(K)Evidence E (K) is expressed as to the letter of H
Degree.
Specifically, step 3 further comprises following steps:
Step 3.1, w is enabledk(0≤wk≤ 1) and Rk(0≤Rk≤ 1) user behavior data D is respectively indicatedkWeight and similar
Vector SVkConfidence level, wherein wk" least important " is indicated for 0, wk" most important " is indicated for 1;RkIt indicates " extremely can not for 0
Letter ", Rk" completely credible " is indicated for 1;In conjunction with weight wkWith confidence level RkAnd the mixed of k-th similarity vector is obtained according to formula (8)
Close weight
In the formula (8), wkIndicate user behavior data DkWeight, and work as user behavior data DkGeneration time
It is more early, wkIt is smaller;
RkFor similarity vector SVkConfidence level, measured by Cluster Assessment index silhouette coefficient, according to formula (9) calculate
It obtains:
Wherein, a (i) indicates sample xiWith the average distance of other samples in same clustering cluster, is calculated and obtained by formula (10)
:
Wherein, b (i) indicates sample xiWith the minimum value of the average distance of the sample of other clustering clusters, according to formula (11)
It calculates and obtains:
In formula (10) and formula (11), d (i, A) and d (i, B) are calculated by Euclidean distance and are obtained, A indicate with
xiSample set in the same cluster, B is indicated and xiSample set in different clusters;
Step 3.2, evidence E (2) are calculated to the support of H using formula (12)
In formula (12),WithDenote like vectorWithHybrid weight, pH,1And pH,2Table respectively
Show similarity vectorWithIn element.
Step 3.3, all by what is obtained using formula (13)It is normalized, and obtains evidence E (2) to H
Reliability:
In formula (13), reliability pH,E(2)As supportValue after normalization,For's
With.Step 3.4, the remaining support of evidence E (2) is calculated using formula (14)
Step 3.5, if the amalgamation result of preceding k similarity vector byIt indicates,
E (k) is calculated to the support of H according to formula (15)
In formula (15), mH,E(k-1)Support after standardizing is indicated, by initial valueSubstitute into simultaneously combinatorial formula
(16) calculating is iterated to obtain;mp(Ω),E(k-1)Remaining support after standardizing is indicated, by initial valueSubstitute into formula
(18), and formula (18) and formula (15) are substituted into formula (17) and is iterated calculating acquisition;
Step 3.6, according to formula (19) to supportIt is normalized, obtains pH,E(k):
In formula (19),Indicate allSum, and be able to satisfyBy above-mentioned
The iterative step of evidential reasoning may finally obtain similarity vector { SV1,SV2,...,SVKAmalgamation result SV*=E (K).
Step 4, the integrated similarity vector SV based on evidential reasoning*, determined using the AGNES algorithm in hierarchy clustering method
Final clustering ensemble result { C1,C2,...,Ct,...,CT, wherein CtFor clustering cluster, T is final cluster number.
Specifically, step 4 further comprises following steps:
Step 4.1, each sample is classified as one kind, at this time T=N, wherein T is cluster number, and N is number of samples, and sample
Similarity between this integrates similarity vector SV using the result of above-mentioned evidential reasoning*It indicates.
Step 4.2, integrated similarity vector SV is found out*In maximum element max_SV*, by max_SV*Representative sample xi
With sample xjGather for one kind, if this classification is Ct。
Step 4.3, the similarity of this class Yu other classes is calculated using formula (20):
In formula (20), sim (x, x') indicates to come from clustering cluster CsSample x and come from clustering cluster CtSample x' it
Between similarity, and with similarity vector SV*The value of middle element corresponds, | Cs| and | Ct| respectively indicate clustering cluster CsAnd CtIn
Number of samples.
Step 4.4, if the number of clustering cluster is T at this time, stop calculating, otherwise repeatedly step 4.2 and step 4.3 until
Final cluster number reaches T.
Below with specific example, experimental demonstration is carried out for the method for the present invention, particular content is as follows:
1, data set
The present embodiment selects commercial user's electricity consumption behavioral data in China coast city to verify for user behavior point
The validity of the clustering ensemble method of analysis.In this commercial user's electricity consumption data, including 169 commercial users, time span
Year totally 7 years electricity consumption data from 2010 to 2016.
2, evaluation index
The present embodiment uses the common silhouette coefficient in cluster field (SC), error sum of squares (SSE) and index of conformity
(C-index) it is used as experimental evaluation index.SSE is obtained by the sum of the distance for calculating central point to all sample points of each class
It arrives, is the widely applied evaluation index in cluster field, the value of SSE is smaller, indicates that Clustering Effect is better.C-index mainly from
The quality of reflection Clustering Effect in terms of condensation degree, its value is smaller, indicates that Clustering Effect is better.SC comprehensively considered condensation degree and
Two kinds of factors of separating degree can effectively judge different clustering algorithms in the quality of same data set, and the value of SC is bigger, indicate
The better the effect of cluster the higher.The calculating of silhouette coefficient, error sum of squares and index of conformity can be respectively by formula (9), formula (21)
It is obtained with formula (22).
In formula (21), NtIndicate t-th of clustering cluster CtIn sample number,Indicate clustering cluster CtCenter.Formula
(21) inIndicate clustering cluster CtIn the sum of the Euclidean distance between sample two-by-two,Table
Show the minimum range between all samples,Indicate the maximum distance between all samples.
3, experimental result
In order to verify the validity of method proposed by the invention, the present invention is enterprising in commercial user's electricity consumption behavioral data collection
Row experiment, and will be provided by the present invention for the clustering ensemble method and six kinds of comparisons based on evidential reasoning of user behavior analysis
Method fuzzy C-means clustering (FCM), K mean cluster (K-Means), Density Clustering (DBSCAN), hierarchical clustering
(Hierarchy), projective clustering (ProClus) and the experimental result of ballot K mean cluster (Voting-Kmeans) are compared
Compared with.Experimental result is as shown in table 1, Fig. 2, Fig. 3 and Fig. 4, and abscissa indicates cluster number in Fig. 2, and ordinate indicates the value of SSE,
Ordinate indicates the value of C-index in Fig. 3, and ordinate indicates the value of SC in Fig. 4.
1 ERCE of table (cluster number: 2-10) compared with comparing algorithm cluster result
From table 1 it follows that the clustering ensemble method ERCE for user behavior analysis that is mentioned of the present invention is in SSE,
Other six kinds of clustering methods are superior under tri- evaluation indexes of C-index, SC.As can also be seen from Table 1, the Clustering Effect of ERCE
There is promotion by a relatively large margin than clustering device FCM, this also further demonstrates the cluster proposed by the present invention based on evidential reasoning
The validity of fusion method.
By Fig. 2, Fig. 3 and Fig. 4, it is apparent that the clustering ensemble method of the present invention based on evidence theory exists
There is preferable performance in indices, and under different cluster numbers, method proposed by the present invention can be obtained preferably
Result.In addition, can be seen that when clustering number is 6 from the curve in above-mentioned figure, curve " inflection point " is just corresponded to
Position, and when clustering number greater than 6, the variation of Cluster Assessment index tends towards stability.Therefore, it is concentrated in this user behavior data,
The optimal selection for clustering number is 6.When method of the invention is applied in other similar data set, the method can also be passed through
To determine best cluster number.
As it will be easily appreciated by one skilled in the art that the above is merely preferred embodiments of the present invention, not to limit
The present invention, any modifications, equivalent substitutions and improvements made within the spirit and principles of the present invention should be included in this
Within the protection scope of invention.
Claims (5)
1. a kind of clustering ensemble method based on evidential reasoning for user behavior analysis, suitable for having the stream of time response
Data set;It is characterized by comprising the following steps:
Step 1, for the user behavior data collection { D of different periods1,D2,...,Dk,...,DK, utilize the Fuzzy C of different parameters
Mean algorithm generates K subordinated-degree matrix { U respectively1,U2,...,Uk,...,UK};Wherein, DkIndicate the data of k-th of period,
UkIndicate k-th of subordinated-degree matrix;The user behavior data collection is will be with the original stream data temporally window of time response
The data set that mouth cutting obtains;
Step 2, K subordinated-degree matrix { U step 1 obtained1,U2,...,Uk,...,UKBe converted to K similar matrix
{SM1,SM2,...,SMk,...,SMK, and similarity vector { SV is converted by the K similar matrix1,SV2,...,
SVk,...,SVK, and be normalized;Wherein, similarity vector is by SV=Ω={ H1,H2,...,Hm,...,HMIndicate;
Step 3, the power set of Ω is enabled to be indicated by formula (7):
Then according to evidential reasoning rule, by the similarity vector { SV1,SV2,...,SVKCan obtain merging by iterative algorithm after
Integrated similarity vector SV*=E (K)={ H1,H2,...,Hm,...,HM, and pH,E(K)Evidence E (K) is expressed as to the reliability of H;
Step 4, the integrated similarity vector SV based on evidential reasoning*, determined using the AGNES algorithm in hierarchy clustering method final
Clustering ensemble result { C1,C2,...,Ct,...,CT, wherein CtFor clustering cluster, T is final cluster number.
2. the clustering ensemble method based on evidential reasoning according to claim 1 for user behavior analysis, feature
It is, the step 1 specifically includes:
Step 1.1, the random number with value in (0,1) section initializes Subject Matrix U, and the Subject Matrix U meets public affairs
The constraint of formula (1):
In formula (1), uijIndicate that j-th of sample point belongs to the probability at ith cluster center in the subordinated-degree matrix U;
C indicates the cluster number of k-th of FCM Algorithms;
Step 1.2, the objective function of formula (2) construction FCM Algorithms is utilized:
In formula (2),Middle m indicates degree of membership uijCoefficient, general value be 2;Indicate ith cluster
Center ciWith j-th of data point xjBetween Euclidean distance;Given threshold value δ or maximum number of iterations Max_iteration, if
Formula (2), which is less than threshold value δ or reaches maximum number of iterations, is then directly entered step 1.5, otherwise enters step 1.3;
Step 1.3, cluster centre c is updated using formula (3) and formula (4)iWith the element u in subordinated-degree matrix Uij:
Step 1.4, revolution executes step 1.2, by the updated cluster centre c of step 1.3iWith element uijBring formula (2) into;
Step 1.5, revolution executes step 1.1, and step 1.1 to step 1.4 is repeated K times, K subordinated-degree matrix { U is obtained1,
U2,...,Uk,...,UK}。
3. the clustering ensemble method based on evidential reasoning according to claim 1 for user behavior analysis, feature
It is, the step 2 specifically includes:
Step 2.1, the subordinated-degree matrix U obtained based on step 1k, calculated according to formula (5) and obtain the similar of kth cluster result
Matrix SMk:
SMk=Uk(Uk)T (5)
In formula (5), similar matrix SMkIn elementIndicate the sample x in kth cluster resultiWith sample xjFrom same
The joint degree of membership of one cluster centre;
Step 2.2, enable similarity vector by SV=Ω={ H1,H2,...,Hm,...,HMIndicate, and the value of similarity vector is by similar
The element of matrix SM diagonal line above section is constituted, the number of element
Step 2.3, using formula (6) to the similarity vector SVkIn element be normalized, can be obtained:
In the formula (6),For similarity vector SVkIn m-th of element, element number is sharedIt is a,For
It is the sum of all, pX,kFor the element after normalizationValue.
4. the clustering ensemble method based on evidential reasoning according to claim 1 for user behavior analysis, feature
It is, the step 3 specifically includes:
Step 3.1, w is enabledk(0≤wk≤ 1) and Rk(0≤Rk≤ 1) user behavior data D is respectively indicatedkWeight and similarity vector
SVkConfidence level, wherein wk" least important " is indicated for 0, wk" most important " is indicated for 1;Rk" extremely insincere ", R are indicated for 0kIt is 1
Indicate " completely credible ";In conjunction with weight wkWith confidence level RkAnd the hybrid weight of k-th similarity vector is obtained according to formula (8)
In the formula (8), wkIndicate user behavior data DkWeight, and work as user behavior data DkGeneration time is more early,
wkIt is smaller;
RkFor similarity vector SVkConfidence level, measured by Cluster Assessment index silhouette coefficient, according to formula (9) calculate obtain
:
Wherein, a (i) indicates sample xiWith the average distance of other samples in same clustering cluster, is calculated and is obtained by formula (10):
Wherein, b (i) indicates sample xiWith the minimum value of the average distance of the sample of other clustering clusters, is calculated and obtained according to formula (11)
:
In formula (10) and formula (11), d (i, A) and d (i, B) are calculated by Euclidean distance and are obtained, and A is indicated and xiPlace
In the sample set of the same cluster, B is indicated and xiSample set in different clusters;
Step 3.2, evidence E (2) are calculated to the support of H using formula (12)
In formula (12),WithDenote like vectorWithHybrid weight, pH,1And pH,2Respectively indicate phase
Like vectorWithIn element;
Step 3.3, all by what is obtained using formula (13)It is normalized, and obtains evidence E (2) to the letter of H
Degree:
In formula (13), reliability pH,E(2)As supportValue after normalization,ForSum;
Step 3.4, the remaining support of evidence E (2) is calculated using formula (14)
Step 3.5, if the amalgamation result of preceding k similarity vector byIt indicates, according to
Formula (15) calculates E (k) to the support of H
In formula (15), mH,E(k-1)Support after standardizing is indicated, by initial valueIt substitutes into and combinatorial formula (16) carries out
Iterative calculation obtains;mp(Ω),E(k-1)Remaining support after standardizing is indicated, by initial valueIt substitutes into formula (18), and will
Formula (18) and formula (15) substitute into formula (17) and are iterated calculating acquisition;
Step 3.6, according to formula (19) to supportIt is normalized, obtains pH,E(k):
In formula (19),Indicate allSum, and be able to satisfyBy above-mentioned evidence
The iterative step of reasoning may finally obtain similarity vector { SV1,SV2,...,SVKAmalgamation result SV*=E (K).
5. the clustering ensemble method based on evidential reasoning according to claim 1 for user behavior analysis, feature
It is, the step 4 specifically includes:
Step 4.1, each sample is classified as one kind, at this time T=N, wherein T is cluster number, and N is number of samples, and sample it
Between similarity using above-mentioned evidential reasoning result integrate similarity vector SV*It indicates;
Step 4.2, integrated similarity vector SV is found out*In maximum element max_SV*, by max_SV*Representative sample xiAnd sample
This xjGather for one kind, if this classification is Ct;
Step 4.3, the similarity of this class Yu other classes is calculated using formula (20):
In formula (20), sim (x, x') indicates to come from clustering cluster CsSample x and come from clustering cluster CtSample x' between
Similarity, and with similarity vector SV*The value of middle element corresponds, | Cs| and | Ct| respectively indicate clustering cluster CsAnd CtIn sample
Number;
Step 4.4, if at this time the number of clustering cluster be T, stop calculating, otherwise repeatedly step 4.2 and step 4.3 until final
Cluster number reaches T.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810814178.8A CN109002858B (en) | 2018-07-23 | 2018-07-23 | Evidence reasoning-based integrated clustering method for user behavior analysis |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810814178.8A CN109002858B (en) | 2018-07-23 | 2018-07-23 | Evidence reasoning-based integrated clustering method for user behavior analysis |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109002858A true CN109002858A (en) | 2018-12-14 |
CN109002858B CN109002858B (en) | 2022-01-28 |
Family
ID=64596928
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810814178.8A Active CN109002858B (en) | 2018-07-23 | 2018-07-23 | Evidence reasoning-based integrated clustering method for user behavior analysis |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109002858B (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110555110A (en) * | 2019-09-10 | 2019-12-10 | 哈尔滨工业大学 | text clustering method combining K-means and evidence accumulation |
CN111144612A (en) * | 2019-11-27 | 2020-05-12 | 北京中交兴路信息科技有限公司 | Gas station position point prediction method and device, storage medium and terminal |
CN111160385A (en) * | 2019-11-27 | 2020-05-15 | 北京中交兴路信息科技有限公司 | Method, device, equipment and storage medium for aggregating mass location points |
CN111241162A (en) * | 2020-01-16 | 2020-06-05 | 同济大学 | Method for analyzing travel behaviors of passengers under high-speed railway network formation condition and storage medium |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102004796A (en) * | 2010-12-24 | 2011-04-06 | 钱钢 | Non-retardant hierarchical classification method and device of webpage texts |
CN102098180A (en) * | 2011-02-17 | 2011-06-15 | 华北电力大学 | Network security situational awareness method |
CN105873119A (en) * | 2016-05-26 | 2016-08-17 | 重庆大学 | Method for classifying flow use behaviors of mobile network user groups |
CN105975956A (en) * | 2016-05-30 | 2016-09-28 | 重庆大学 | Infrared-panorama-pick-up-head-based abnormal behavior identification method of elderly people living alone |
CN106295688A (en) * | 2016-08-02 | 2017-01-04 | 浙江工业大学 | A kind of fuzzy clustering method based on sparse average |
CN106951687A (en) * | 2017-02-28 | 2017-07-14 | 广东电网有限责任公司惠州供电局 | Transformer insulated Stress calculation and evaluation method based on fuzzy logic and evidential reasoning |
US20180096052A1 (en) * | 2016-11-22 | 2018-04-05 | Flytxt BV | Systems and methods for management of multi-perspective customer segments |
-
2018
- 2018-07-23 CN CN201810814178.8A patent/CN109002858B/en active Active
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102004796A (en) * | 2010-12-24 | 2011-04-06 | 钱钢 | Non-retardant hierarchical classification method and device of webpage texts |
CN102098180A (en) * | 2011-02-17 | 2011-06-15 | 华北电力大学 | Network security situational awareness method |
CN105873119A (en) * | 2016-05-26 | 2016-08-17 | 重庆大学 | Method for classifying flow use behaviors of mobile network user groups |
CN105975956A (en) * | 2016-05-30 | 2016-09-28 | 重庆大学 | Infrared-panorama-pick-up-head-based abnormal behavior identification method of elderly people living alone |
CN106295688A (en) * | 2016-08-02 | 2017-01-04 | 浙江工业大学 | A kind of fuzzy clustering method based on sparse average |
US20180096052A1 (en) * | 2016-11-22 | 2018-04-05 | Flytxt BV | Systems and methods for management of multi-perspective customer segments |
CN106951687A (en) * | 2017-02-28 | 2017-07-14 | 广东电网有限责任公司惠州供电局 | Transformer insulated Stress calculation and evaluation method based on fuzzy logic and evidential reasoning |
Non-Patent Citations (4)
Title |
---|
D.W. LEVERINGTON等: "An evaluation of consensus neural networks and evidential reasoning algorithms for image classification", 《IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM》 * |
YU WENDONG等: "Social media user partitioning based on ensemble clustering", 《2016 13TH INTERNATIONAL CONFERENCE ON SERVICE SYSTEMS AND SERVICE MANAGEMENT (ICSSSM)》 * |
毕凯等: "基于模糊测度和证据理论的模糊聚类集成方法", 《控制与决策》 * |
费博雯等: "距离决策下的模糊聚类集成模型", 《电子与信息学报》 * |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110555110A (en) * | 2019-09-10 | 2019-12-10 | 哈尔滨工业大学 | text clustering method combining K-means and evidence accumulation |
CN111144612A (en) * | 2019-11-27 | 2020-05-12 | 北京中交兴路信息科技有限公司 | Gas station position point prediction method and device, storage medium and terminal |
CN111160385A (en) * | 2019-11-27 | 2020-05-15 | 北京中交兴路信息科技有限公司 | Method, device, equipment and storage medium for aggregating mass location points |
CN111144612B (en) * | 2019-11-27 | 2023-05-09 | 北京中交兴路信息科技有限公司 | Method and device for predicting position point of gas station, storage medium and terminal |
CN111241162A (en) * | 2020-01-16 | 2020-06-05 | 同济大学 | Method for analyzing travel behaviors of passengers under high-speed railway network formation condition and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN109002858B (en) | 2022-01-28 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Xu et al. | A comprehensive survey of clustering algorithms | |
Abd Elaziz et al. | Automatic data clustering based on hybrid atom search optimization and sine-cosine algorithm | |
Fraley et al. | How many clusters? Which clustering method? Answers via model-based cluster analysis | |
CN109002858A (en) | A kind of clustering ensemble method based on evidential reasoning for user behavior analysis | |
He et al. | A two-stage genetic algorithm for automatic clustering | |
Su et al. | Facilitating score and causal inference trees for large observational studies | |
CN105760888B (en) | A kind of neighborhood rough set integrated learning approach based on hierarchical cluster attribute | |
Falasconi et al. | A stability based validity method for fuzzy clustering | |
Lee et al. | A meta-learning approach for determining the number of clusters with consideration of nearest neighbors | |
CN108985327A (en) | A kind of Terrain Matching self-organizing Optimum Classification method based on factorial analysis | |
Liu et al. | Dynamic local search based immune automatic clustering algorithm and its applications | |
Verikas et al. | A general framework for designing a fuzzy rule-based classifier | |
CN106845536A (en) | A kind of parallel clustering method based on image scaling | |
CN110364264A (en) | Medical data collection feature dimension reduction method based on sub-space learning | |
Jiang | Spatial structured prediction models: Applications, challenges, and techniques | |
CN110137951B (en) | Market partitioning method and device based on node electricity price | |
Luo et al. | Learning simultaneous adaptive clustering and classification via MOEA | |
Bakrania et al. | Using dimensionality reduction and clustering techniques to classify space plasma regimes | |
CN112668633A (en) | Adaptive graph migration learning method based on fine granularity field | |
CN117272204A (en) | Abnormal data detection method, device, storage medium and electronic equipment | |
Li et al. | IRFAM: Integrated rule-based fuzzy adaptive resonance theory mapping system for watershed modeling | |
CN111209611A (en) | Hyperbolic geometry-based directed network space embedding method | |
Melnykov et al. | Recent developments in model-based clustering with applications | |
Nanda et al. | A correlation based stochastic partitional algorithm for accurate cluster analysis | |
Chen et al. | Operational scenario definition in traffic simulation-based decision support systems: Pattern recognition using a clustering algorithm |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |