CN109002858A - A kind of clustering ensemble method based on evidential reasoning for user behavior analysis - Google Patents

A kind of clustering ensemble method based on evidential reasoning for user behavior analysis Download PDF

Info

Publication number
CN109002858A
CN109002858A CN201810814178.8A CN201810814178A CN109002858A CN 109002858 A CN109002858 A CN 109002858A CN 201810814178 A CN201810814178 A CN 201810814178A CN 109002858 A CN109002858 A CN 109002858A
Authority
CN
China
Prior art keywords
formula
cluster
clustering
user behavior
sample
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201810814178.8A
Other languages
Chinese (zh)
Other versions
CN109002858B (en
Inventor
褚燕
王刚
张峰
陈刚
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hefei University of Technology
Original Assignee
Hefei University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hefei University of Technology filed Critical Hefei University of Technology
Priority to CN201810814178.8A priority Critical patent/CN109002858B/en
Publication of CN109002858A publication Critical patent/CN109002858A/en
Application granted granted Critical
Publication of CN109002858B publication Critical patent/CN109002858B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/04Inference or reasoning models

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computational Linguistics (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The present invention provides a kind of clustering ensemble method based on evidential reasoning for user behavior analysis, it can fully consider the time response of user data and the credibility of base cluster device, by using the method for evidential reasoning it is comprehensive solve the problems, such as individually to cluster the not strong and existing clustering ensemble method applicability of device robustness and stability it is poor, to improve the Clustering Effect of user behavior data.The beneficial effects of the present invention are: it can overcome the problems, such as that user behavior data fails because of traditional clustering algorithm brought by higher-dimension;It can integrate that solve the problems, such as individually to cluster the not strong and existing clustering ensemble method applicability of device robustness and stability poor, to improve the Clustering Effect of user behavior data;The present invention can be used for the cluster of user behavior data, particular with the user behavior data clustering problem of high dimensional feature, can be also used for the cluster etc. of flow data, has wide range of applications.

Description

A kind of clustering ensemble method based on evidential reasoning for user behavior analysis
Technical field
The present invention relates to clustering method technical field more particularly to it is a kind of for user behavior analysis based on evidential reasoning Clustering ensemble method.
Background technique
Currently used clustering method has five classes, including the clustering method based on division, the clustering method based on level, base In the clustering method of level, density clustering method and based on the clustering method of grid.Based on division based on division Clustering method represents method such as-mean value (k-means) clustering method, its thought is can by the object nearest apart from cluster center To be divided into a cluster;Clustering method thought based on level is carried out by creating hierachical decomposition for data-oriented object set The method of cluster;Density clustering method, represents method such as DBSCAN algorithm, which assumes that cluster structure can pass through sample The tightness degree of this distribution determines;Clustering method based on model such as EM algorithm, can be used for containing hidden variable (latentvariable) maximal possibility estimation or maximum a posteriori estimate of probability parameter model;Cluster based on grid The thought of method is that object space is quantified as to a limited number of unit, forms a reticular structure, all clusters are all at this It is carried out on reticular structure.
In general, these single clustering methods can by the analysis of excavation and behavioral trait to user behavior data, Effectively identification user behavior pattern, evaluation requirement respond potentiality, to provide decision-making foundation for the formulation of marketing program.However, With the continuous renewal of user behavior data, the rapid development of data volume, data acquire user, and there is extremely strong dispersibility to wait one The appearance of series challenge, existing method is due to being highly susceptible to data variation using its stability of single Clustering Model and accuracy Influence, generalization ability and adaptability is not strong, can not the electricity consumption behavior to different type user carry out it is deep, quick, accurate Analysis.Basic reason is the inherent ambiguity of natural grouping concept in data set.Another where the shoe pinches is clustering cluster Diversity, clustering cluster can have a different shapes, different density, different sizes, and they are often overlapped.By Often there are various problems in single clustering algorithm, occurs the research of many clustering ensemble algorithms in recent years.Clustering ensemble Thought seeks to generate a cluster collective, that is, has that multiple cluster results are available, the result clustered then in conjunction with these with It asks to obtain one and more preferably cluster.The problem of being clustered in conjunction with member in cluster collective also referred to as compatibility function problem, Huo Chengwei Integration problem.Existing clustering ensemble method includes method based on Co-Occurrence and based on MedianPartition's Method.Clustering ensemble method based on Co-Occurrence beats again label and voting method, assists matrix method and drawing method altogether Deng;Clustering ensemble method based on MedianPartition has genetic algorithm, Non-negative Matrix Factorization and kernel method etc..In recent years, The attention of many researchers has been obtained about the research of clustering ensemble, and evidential reasoning melts as a kind of effective information Conjunction method has been applied to many fields, however there has been no evidential reasoning rule is dissolved into showing during clustering ensemble at present There is technology.
Summary of the invention
In order to solve above-mentioned technological deficiency existing in the prior art, the present invention provides a kind of for user behavior analysis Clustering ensemble method based on evidential reasoning can fully consider the time response of user data and the credible journey of base cluster device Degree solves single cluster device robustness and the not strong and existing clustering ensemble side of stability by using the method for evidential reasoning is comprehensive The poor problem of method adaptability, to improve the Clustering Effect of user behavior data.
The present invention is achieved by the following technical solutions:
A kind of clustering ensemble method based on evidential reasoning for user behavior analysis, suitable for having time response Flow data set;The clustering ensemble method includes the following steps:
Step 1, for the user behavior data collection { D of different periods1,D2,...,Dk,...,DK, utilize different parameters FCM Algorithms generate K subordinated-degree matrix { U respectively1,U2,...,Uk,...,UK};Wherein, DkIndicate k-th period Data, UkIndicate k-th of subordinated-degree matrix;The user behavior data collection be will with time response original stream data on time Between window cutting obtain data set;
Step 2, K subordinated-degree matrix { U step 1 obtained1,U2,...,Uk,...,UKTo be converted to K similar Matrix { SM1,SM2,...,SMk,...,SMK, and similarity vector { SV is converted by the K similar matrix1,SV2,..., SVk,...,SVK, and be normalized;Wherein, similarity vector is by SV=Ω={ H1,H2,...,Hm,...,HMIndicate;
Step 3, the power set of Ω is enabled to be indicated by formula (7):
Then according to evidential reasoning rule, by the similarity vector { SV1,SV2,...,SVKCan be closed by iterative algorithm Integrated similarity vector SV after and*=E (K)={ H1,H2,...,Hm,...,HM, and pH,E(K)Evidence E (K) is expressed as to the letter of H Degree;
Step 4, the integrated similarity vector SV based on evidential reasoning*, determined using the AGNES algorithm in hierarchy clustering method Final clustering ensemble result { C1,C2,...,Ct,...,CT, wherein CtFor clustering cluster, T is final cluster number.
The beneficial effect of the present invention compared with the existing technology is:
First, the present invention by user behavior data temporally span carry out cutting, using FCM Algorithms to it is different when Between the user behavior data of section clustered, and clustering ensemble is carried out by the method based on evidential reasoning, user can be overcome The problem of behavioral data fails because of traditional clustering algorithm brought by higher-dimension.
Second, the present invention can fully consider the time response of user data and the credibility of base cluster device, by adopting Single cluster device robustness and the not strong and existing clustering ensemble method applicability of stability are solved with the method for evidential reasoning is comprehensive Poor problem, to improve the Clustering Effect of user behavior data.
Third, method proposed by the invention can be used for the cluster of user behavior data, particular with high dimensional feature User behavior data clustering problem can be also used for the cluster etc. of flow data, have wide range of applications.
Detailed description of the invention
Fig. 1 is the general flow chart of the clustering ensemble method based on evidential reasoning for user behavior analysis.
Fig. 2 is the analysis result figure of error sum of squares SSE value.
Fig. 3 is the analysis result figure of index of conformity C-index value.
Fig. 4 is the analysis result figure of silhouette coefficient SC value.
Specific embodiment
In order to make the objectives, technical solutions, and advantages of the present invention clearer, with reference to the accompanying drawings and embodiments, right The present invention is further elaborated.It should be appreciated that the specific embodiments described herein are only used to explain the present invention, It is not intended to limit the present invention.
Embodiment 1:
As shown in Figure 1, a kind of clustering ensemble method based on evidential reasoning for user behavior analysis, suitable for having The flow data set of time response, the clustering ensemble method include the following steps:
Step 1, for the user behavior data collection of different periods, according to data the feature of itself per year, the moon or day be Time window cutting is { D1,D2,...,Dk,...,DK, K degree of membership is generated respectively using the FCM Algorithms of different parameters Matrix { U1,U2,...,Uk,...,UK};Wherein, DkIndicate the data of k-th of period, UkIndicate k-th of subordinated-degree matrix.User Behavioral data collection is that temporally window cutting obtains by initial data, (such as 7 years user power utilization numbers used in experiment According to if time window is set to year, by the initial data panel data that cutting is seven per year).
Specifically, step 1 further comprises following steps:
Step 1.1, the random number with value in (0,1) section initializes Subject Matrix U, and the Subject Matrix U is full The constraint of sufficient formula (1):
In formula (1), uijIndicate that j-th of sample point belongs to the general of ith cluster center in the subordinated-degree matrix U Rate;C indicates the cluster number of k-th of FCM Algorithms.
Step 1.2, the objective function of formula (2) construction FCM Algorithms is utilized:
In formula (2),Middle m indicates degree of membership uijCoefficient, general value be 2;It indicates i-th Cluster centre ciWith j-th of data point xjBetween Euclidean distance;Given threshold value δ or maximum number of iterations Max_ Iteration is less than threshold value δ if formula (2) or is directly entered step 1.5 if reaching maximum number of iterations, otherwise enters step 1.3。
Step 1.3, cluster centre c is updated using formula (3) and formula (4)iWith the element u in subordinated-degree matrix Uij:
Step 1.4, revolution executes step 1.2, by the updated cluster centre c of step 1.3iWith element uijBring formula into (2)。
Step 1.5, revolution executes step 1.1, and step 1.1 to step 1.4 is repeated K times, K subordinated-degree matrix is obtained {U1,U2,...,Uk,...,UK}。
Step 2, K subordinated-degree matrix { U step 1 obtained1,U2,...,Uk,...,UKTo be converted to K similar Matrix { SM1,SM2,...,SMk,...,SMK, and similarity vector { SV is converted by the K similar matrix1,SV2,..., SVk,...,SVK, and be normalized;Wherein, similarity vector is by SV=Ω={ H1,H2,...,Hm,...,HMIndicate.
Specifically, step 2 further comprises following steps:
Step 2.1, the subordinated-degree matrix U obtained based on step 1k, calculated according to formula (5) and obtain kth cluster result Similar matrix SMk:
SMk=Uk(Uk)T (5)
In formula (5), similar matrix SMkIn elementIndicate the sample x in kth cluster resultiWith sample xjCome From the joint degree of membership of the same cluster centre.
Step 2.2, enable similarity vector by SV=Ω={ H1,H2,...,Hm,...,HMIndicate, and the value of similarity vector by The element of similar matrix SM diagonal line above section is constituted, the number of element
Step 2.3, using formula (6) to the similarity vector SVkIn element be normalized, can be obtained:
In the formula (6),For similarity vector SVkIn m-th of element, element number is sharedIt is a,To be the sum of all, pX,kFor the element after normalizationValue.
Step 3, the power set of Ω is enabled to be indicated by formula (7):
Then according to evidential reasoning rule, by the similarity vector { SV1,SV2,...,SVKCan be closed by iterative algorithm Integrated similarity vector SV after and*=E (K)={ H1,H2,...,Hm,...,HM, and pH,E(K)Evidence E (K) is expressed as to the letter of H Degree.
Specifically, step 3 further comprises following steps:
Step 3.1, w is enabledk(0≤wk≤ 1) and Rk(0≤Rk≤ 1) user behavior data D is respectively indicatedkWeight and similar Vector SVkConfidence level, wherein wk" least important " is indicated for 0, wk" most important " is indicated for 1;RkIt indicates " extremely can not for 0 Letter ", Rk" completely credible " is indicated for 1;In conjunction with weight wkWith confidence level RkAnd the mixed of k-th similarity vector is obtained according to formula (8) Close weight
In the formula (8), wkIndicate user behavior data DkWeight, and work as user behavior data DkGeneration time It is more early, wkIt is smaller;
RkFor similarity vector SVkConfidence level, measured by Cluster Assessment index silhouette coefficient, according to formula (9) calculate It obtains:
Wherein, a (i) indicates sample xiWith the average distance of other samples in same clustering cluster, is calculated and obtained by formula (10) :
Wherein, b (i) indicates sample xiWith the minimum value of the average distance of the sample of other clustering clusters, according to formula (11) It calculates and obtains:
In formula (10) and formula (11), d (i, A) and d (i, B) are calculated by Euclidean distance and are obtained, A indicate with xiSample set in the same cluster, B is indicated and xiSample set in different clusters;
Step 3.2, evidence E (2) are calculated to the support of H using formula (12)
In formula (12),WithDenote like vectorWithHybrid weight, pH,1And pH,2Table respectively Show similarity vectorWithIn element.
Step 3.3, all by what is obtained using formula (13)It is normalized, and obtains evidence E (2) to H Reliability:
In formula (13), reliability pH,E(2)As supportValue after normalization,For's With.Step 3.4, the remaining support of evidence E (2) is calculated using formula (14)
Step 3.5, if the amalgamation result of preceding k similarity vector byIt indicates, E (k) is calculated to the support of H according to formula (15)
In formula (15), mH,E(k-1)Support after standardizing is indicated, by initial valueSubstitute into simultaneously combinatorial formula (16) calculating is iterated to obtain;mp(Ω),E(k-1)Remaining support after standardizing is indicated, by initial valueSubstitute into formula (18), and formula (18) and formula (15) are substituted into formula (17) and is iterated calculating acquisition;
Step 3.6, according to formula (19) to supportIt is normalized, obtains pH,E(k):
In formula (19),Indicate allSum, and be able to satisfyBy above-mentioned The iterative step of evidential reasoning may finally obtain similarity vector { SV1,SV2,...,SVKAmalgamation result SV*=E (K).
Step 4, the integrated similarity vector SV based on evidential reasoning*, determined using the AGNES algorithm in hierarchy clustering method Final clustering ensemble result { C1,C2,...,Ct,...,CT, wherein CtFor clustering cluster, T is final cluster number.
Specifically, step 4 further comprises following steps:
Step 4.1, each sample is classified as one kind, at this time T=N, wherein T is cluster number, and N is number of samples, and sample Similarity between this integrates similarity vector SV using the result of above-mentioned evidential reasoning*It indicates.
Step 4.2, integrated similarity vector SV is found out*In maximum element max_SV*, by max_SV*Representative sample xi With sample xjGather for one kind, if this classification is Ct
Step 4.3, the similarity of this class Yu other classes is calculated using formula (20):
In formula (20), sim (x, x') indicates to come from clustering cluster CsSample x and come from clustering cluster CtSample x' it Between similarity, and with similarity vector SV*The value of middle element corresponds, | Cs| and | Ct| respectively indicate clustering cluster CsAnd CtIn Number of samples.
Step 4.4, if the number of clustering cluster is T at this time, stop calculating, otherwise repeatedly step 4.2 and step 4.3 until Final cluster number reaches T.
Below with specific example, experimental demonstration is carried out for the method for the present invention, particular content is as follows:
1, data set
The present embodiment selects commercial user's electricity consumption behavioral data in China coast city to verify for user behavior point The validity of the clustering ensemble method of analysis.In this commercial user's electricity consumption data, including 169 commercial users, time span Year totally 7 years electricity consumption data from 2010 to 2016.
2, evaluation index
The present embodiment uses the common silhouette coefficient in cluster field (SC), error sum of squares (SSE) and index of conformity (C-index) it is used as experimental evaluation index.SSE is obtained by the sum of the distance for calculating central point to all sample points of each class It arrives, is the widely applied evaluation index in cluster field, the value of SSE is smaller, indicates that Clustering Effect is better.C-index mainly from The quality of reflection Clustering Effect in terms of condensation degree, its value is smaller, indicates that Clustering Effect is better.SC comprehensively considered condensation degree and Two kinds of factors of separating degree can effectively judge different clustering algorithms in the quality of same data set, and the value of SC is bigger, indicate The better the effect of cluster the higher.The calculating of silhouette coefficient, error sum of squares and index of conformity can be respectively by formula (9), formula (21) It is obtained with formula (22).
In formula (21), NtIndicate t-th of clustering cluster CtIn sample number,Indicate clustering cluster CtCenter.Formula (21) inIndicate clustering cluster CtIn the sum of the Euclidean distance between sample two-by-two,Table Show the minimum range between all samples,Indicate the maximum distance between all samples.
3, experimental result
In order to verify the validity of method proposed by the invention, the present invention is enterprising in commercial user's electricity consumption behavioral data collection Row experiment, and will be provided by the present invention for the clustering ensemble method and six kinds of comparisons based on evidential reasoning of user behavior analysis Method fuzzy C-means clustering (FCM), K mean cluster (K-Means), Density Clustering (DBSCAN), hierarchical clustering (Hierarchy), projective clustering (ProClus) and the experimental result of ballot K mean cluster (Voting-Kmeans) are compared Compared with.Experimental result is as shown in table 1, Fig. 2, Fig. 3 and Fig. 4, and abscissa indicates cluster number in Fig. 2, and ordinate indicates the value of SSE, Ordinate indicates the value of C-index in Fig. 3, and ordinate indicates the value of SC in Fig. 4.
1 ERCE of table (cluster number: 2-10) compared with comparing algorithm cluster result
From table 1 it follows that the clustering ensemble method ERCE for user behavior analysis that is mentioned of the present invention is in SSE, Other six kinds of clustering methods are superior under tri- evaluation indexes of C-index, SC.As can also be seen from Table 1, the Clustering Effect of ERCE There is promotion by a relatively large margin than clustering device FCM, this also further demonstrates the cluster proposed by the present invention based on evidential reasoning The validity of fusion method.
By Fig. 2, Fig. 3 and Fig. 4, it is apparent that the clustering ensemble method of the present invention based on evidence theory exists There is preferable performance in indices, and under different cluster numbers, method proposed by the present invention can be obtained preferably Result.In addition, can be seen that when clustering number is 6 from the curve in above-mentioned figure, curve " inflection point " is just corresponded to Position, and when clustering number greater than 6, the variation of Cluster Assessment index tends towards stability.Therefore, it is concentrated in this user behavior data, The optimal selection for clustering number is 6.When method of the invention is applied in other similar data set, the method can also be passed through To determine best cluster number.
As it will be easily appreciated by one skilled in the art that the above is merely preferred embodiments of the present invention, not to limit The present invention, any modifications, equivalent substitutions and improvements made within the spirit and principles of the present invention should be included in this Within the protection scope of invention.

Claims (5)

1. a kind of clustering ensemble method based on evidential reasoning for user behavior analysis, suitable for having the stream of time response Data set;It is characterized by comprising the following steps:
Step 1, for the user behavior data collection { D of different periods1,D2,...,Dk,...,DK, utilize the Fuzzy C of different parameters Mean algorithm generates K subordinated-degree matrix { U respectively1,U2,...,Uk,...,UK};Wherein, DkIndicate the data of k-th of period, UkIndicate k-th of subordinated-degree matrix;The user behavior data collection is will be with the original stream data temporally window of time response The data set that mouth cutting obtains;
Step 2, K subordinated-degree matrix { U step 1 obtained1,U2,...,Uk,...,UKBe converted to K similar matrix {SM1,SM2,...,SMk,...,SMK, and similarity vector { SV is converted by the K similar matrix1,SV2,..., SVk,...,SVK, and be normalized;Wherein, similarity vector is by SV=Ω={ H1,H2,...,Hm,...,HMIndicate;
Step 3, the power set of Ω is enabled to be indicated by formula (7):
Then according to evidential reasoning rule, by the similarity vector { SV1,SV2,...,SVKCan obtain merging by iterative algorithm after Integrated similarity vector SV*=E (K)={ H1,H2,...,Hm,...,HM, and pH,E(K)Evidence E (K) is expressed as to the reliability of H;
Step 4, the integrated similarity vector SV based on evidential reasoning*, determined using the AGNES algorithm in hierarchy clustering method final Clustering ensemble result { C1,C2,...,Ct,...,CT, wherein CtFor clustering cluster, T is final cluster number.
2. the clustering ensemble method based on evidential reasoning according to claim 1 for user behavior analysis, feature It is, the step 1 specifically includes:
Step 1.1, the random number with value in (0,1) section initializes Subject Matrix U, and the Subject Matrix U meets public affairs The constraint of formula (1):
In formula (1), uijIndicate that j-th of sample point belongs to the probability at ith cluster center in the subordinated-degree matrix U;
C indicates the cluster number of k-th of FCM Algorithms;
Step 1.2, the objective function of formula (2) construction FCM Algorithms is utilized:
In formula (2),Middle m indicates degree of membership uijCoefficient, general value be 2;Indicate ith cluster Center ciWith j-th of data point xjBetween Euclidean distance;Given threshold value δ or maximum number of iterations Max_iteration, if Formula (2), which is less than threshold value δ or reaches maximum number of iterations, is then directly entered step 1.5, otherwise enters step 1.3;
Step 1.3, cluster centre c is updated using formula (3) and formula (4)iWith the element u in subordinated-degree matrix Uij:
Step 1.4, revolution executes step 1.2, by the updated cluster centre c of step 1.3iWith element uijBring formula (2) into;
Step 1.5, revolution executes step 1.1, and step 1.1 to step 1.4 is repeated K times, K subordinated-degree matrix { U is obtained1, U2,...,Uk,...,UK}。
3. the clustering ensemble method based on evidential reasoning according to claim 1 for user behavior analysis, feature It is, the step 2 specifically includes:
Step 2.1, the subordinated-degree matrix U obtained based on step 1k, calculated according to formula (5) and obtain the similar of kth cluster result Matrix SMk:
SMk=Uk(Uk)T (5)
In formula (5), similar matrix SMkIn elementIndicate the sample x in kth cluster resultiWith sample xjFrom same The joint degree of membership of one cluster centre;
Step 2.2, enable similarity vector by SV=Ω={ H1,H2,...,Hm,...,HMIndicate, and the value of similarity vector is by similar The element of matrix SM diagonal line above section is constituted, the number of element
Step 2.3, using formula (6) to the similarity vector SVkIn element be normalized, can be obtained:
In the formula (6),For similarity vector SVkIn m-th of element, element number is sharedIt is a,For It is the sum of all, pX,kFor the element after normalizationValue.
4. the clustering ensemble method based on evidential reasoning according to claim 1 for user behavior analysis, feature It is, the step 3 specifically includes:
Step 3.1, w is enabledk(0≤wk≤ 1) and Rk(0≤Rk≤ 1) user behavior data D is respectively indicatedkWeight and similarity vector SVkConfidence level, wherein wk" least important " is indicated for 0, wk" most important " is indicated for 1;Rk" extremely insincere ", R are indicated for 0kIt is 1 Indicate " completely credible ";In conjunction with weight wkWith confidence level RkAnd the hybrid weight of k-th similarity vector is obtained according to formula (8)
In the formula (8), wkIndicate user behavior data DkWeight, and work as user behavior data DkGeneration time is more early, wkIt is smaller;
RkFor similarity vector SVkConfidence level, measured by Cluster Assessment index silhouette coefficient, according to formula (9) calculate obtain :
Wherein, a (i) indicates sample xiWith the average distance of other samples in same clustering cluster, is calculated and is obtained by formula (10):
Wherein, b (i) indicates sample xiWith the minimum value of the average distance of the sample of other clustering clusters, is calculated and obtained according to formula (11) :
In formula (10) and formula (11), d (i, A) and d (i, B) are calculated by Euclidean distance and are obtained, and A is indicated and xiPlace In the sample set of the same cluster, B is indicated and xiSample set in different clusters;
Step 3.2, evidence E (2) are calculated to the support of H using formula (12)
In formula (12),WithDenote like vectorWithHybrid weight, pH,1And pH,2Respectively indicate phase Like vectorWithIn element;
Step 3.3, all by what is obtained using formula (13)It is normalized, and obtains evidence E (2) to the letter of H Degree:
In formula (13), reliability pH,E(2)As supportValue after normalization,ForSum;
Step 3.4, the remaining support of evidence E (2) is calculated using formula (14)
Step 3.5, if the amalgamation result of preceding k similarity vector byIt indicates, according to Formula (15) calculates E (k) to the support of H
In formula (15), mH,E(k-1)Support after standardizing is indicated, by initial valueIt substitutes into and combinatorial formula (16) carries out Iterative calculation obtains;mp(Ω),E(k-1)Remaining support after standardizing is indicated, by initial valueIt substitutes into formula (18), and will Formula (18) and formula (15) substitute into formula (17) and are iterated calculating acquisition;
Step 3.6, according to formula (19) to supportIt is normalized, obtains pH,E(k):
In formula (19),Indicate allSum, and be able to satisfyBy above-mentioned evidence The iterative step of reasoning may finally obtain similarity vector { SV1,SV2,...,SVKAmalgamation result SV*=E (K).
5. the clustering ensemble method based on evidential reasoning according to claim 1 for user behavior analysis, feature It is, the step 4 specifically includes:
Step 4.1, each sample is classified as one kind, at this time T=N, wherein T is cluster number, and N is number of samples, and sample it Between similarity using above-mentioned evidential reasoning result integrate similarity vector SV*It indicates;
Step 4.2, integrated similarity vector SV is found out*In maximum element max_SV*, by max_SV*Representative sample xiAnd sample This xjGather for one kind, if this classification is Ct
Step 4.3, the similarity of this class Yu other classes is calculated using formula (20):
In formula (20), sim (x, x') indicates to come from clustering cluster CsSample x and come from clustering cluster CtSample x' between Similarity, and with similarity vector SV*The value of middle element corresponds, | Cs| and | Ct| respectively indicate clustering cluster CsAnd CtIn sample Number;
Step 4.4, if at this time the number of clustering cluster be T, stop calculating, otherwise repeatedly step 4.2 and step 4.3 until final Cluster number reaches T.
CN201810814178.8A 2018-07-23 2018-07-23 Evidence reasoning-based integrated clustering method for user behavior analysis Active CN109002858B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810814178.8A CN109002858B (en) 2018-07-23 2018-07-23 Evidence reasoning-based integrated clustering method for user behavior analysis

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810814178.8A CN109002858B (en) 2018-07-23 2018-07-23 Evidence reasoning-based integrated clustering method for user behavior analysis

Publications (2)

Publication Number Publication Date
CN109002858A true CN109002858A (en) 2018-12-14
CN109002858B CN109002858B (en) 2022-01-28

Family

ID=64596928

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810814178.8A Active CN109002858B (en) 2018-07-23 2018-07-23 Evidence reasoning-based integrated clustering method for user behavior analysis

Country Status (1)

Country Link
CN (1) CN109002858B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110555110A (en) * 2019-09-10 2019-12-10 哈尔滨工业大学 text clustering method combining K-means and evidence accumulation
CN111144612A (en) * 2019-11-27 2020-05-12 北京中交兴路信息科技有限公司 Gas station position point prediction method and device, storage medium and terminal
CN111160385A (en) * 2019-11-27 2020-05-15 北京中交兴路信息科技有限公司 Method, device, equipment and storage medium for aggregating mass location points
CN111241162A (en) * 2020-01-16 2020-06-05 同济大学 Method for analyzing travel behaviors of passengers under high-speed railway network formation condition and storage medium

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102004796A (en) * 2010-12-24 2011-04-06 钱钢 Non-retardant hierarchical classification method and device of webpage texts
CN102098180A (en) * 2011-02-17 2011-06-15 华北电力大学 Network security situational awareness method
CN105873119A (en) * 2016-05-26 2016-08-17 重庆大学 Method for classifying flow use behaviors of mobile network user groups
CN105975956A (en) * 2016-05-30 2016-09-28 重庆大学 Infrared-panorama-pick-up-head-based abnormal behavior identification method of elderly people living alone
CN106295688A (en) * 2016-08-02 2017-01-04 浙江工业大学 A kind of fuzzy clustering method based on sparse average
CN106951687A (en) * 2017-02-28 2017-07-14 广东电网有限责任公司惠州供电局 Transformer insulated Stress calculation and evaluation method based on fuzzy logic and evidential reasoning
US20180096052A1 (en) * 2016-11-22 2018-04-05 Flytxt BV Systems and methods for management of multi-perspective customer segments

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102004796A (en) * 2010-12-24 2011-04-06 钱钢 Non-retardant hierarchical classification method and device of webpage texts
CN102098180A (en) * 2011-02-17 2011-06-15 华北电力大学 Network security situational awareness method
CN105873119A (en) * 2016-05-26 2016-08-17 重庆大学 Method for classifying flow use behaviors of mobile network user groups
CN105975956A (en) * 2016-05-30 2016-09-28 重庆大学 Infrared-panorama-pick-up-head-based abnormal behavior identification method of elderly people living alone
CN106295688A (en) * 2016-08-02 2017-01-04 浙江工业大学 A kind of fuzzy clustering method based on sparse average
US20180096052A1 (en) * 2016-11-22 2018-04-05 Flytxt BV Systems and methods for management of multi-perspective customer segments
CN106951687A (en) * 2017-02-28 2017-07-14 广东电网有限责任公司惠州供电局 Transformer insulated Stress calculation and evaluation method based on fuzzy logic and evidential reasoning

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
D.W. LEVERINGTON等: "An evaluation of consensus neural networks and evidential reasoning algorithms for image classification", 《IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM》 *
YU WENDONG等: "Social media user partitioning based on ensemble clustering", 《2016 13TH INTERNATIONAL CONFERENCE ON SERVICE SYSTEMS AND SERVICE MANAGEMENT (ICSSSM)》 *
毕凯等: "基于模糊测度和证据理论的模糊聚类集成方法", 《控制与决策》 *
费博雯等: "距离决策下的模糊聚类集成模型", 《电子与信息学报》 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110555110A (en) * 2019-09-10 2019-12-10 哈尔滨工业大学 text clustering method combining K-means and evidence accumulation
CN111144612A (en) * 2019-11-27 2020-05-12 北京中交兴路信息科技有限公司 Gas station position point prediction method and device, storage medium and terminal
CN111160385A (en) * 2019-11-27 2020-05-15 北京中交兴路信息科技有限公司 Method, device, equipment and storage medium for aggregating mass location points
CN111144612B (en) * 2019-11-27 2023-05-09 北京中交兴路信息科技有限公司 Method and device for predicting position point of gas station, storage medium and terminal
CN111241162A (en) * 2020-01-16 2020-06-05 同济大学 Method for analyzing travel behaviors of passengers under high-speed railway network formation condition and storage medium

Also Published As

Publication number Publication date
CN109002858B (en) 2022-01-28

Similar Documents

Publication Publication Date Title
Xu et al. A comprehensive survey of clustering algorithms
Abd Elaziz et al. Automatic data clustering based on hybrid atom search optimization and sine-cosine algorithm
Fraley et al. How many clusters? Which clustering method? Answers via model-based cluster analysis
CN109002858A (en) A kind of clustering ensemble method based on evidential reasoning for user behavior analysis
He et al. A two-stage genetic algorithm for automatic clustering
Su et al. Facilitating score and causal inference trees for large observational studies
CN105760888B (en) A kind of neighborhood rough set integrated learning approach based on hierarchical cluster attribute
Falasconi et al. A stability based validity method for fuzzy clustering
Lee et al. A meta-learning approach for determining the number of clusters with consideration of nearest neighbors
CN108985327A (en) A kind of Terrain Matching self-organizing Optimum Classification method based on factorial analysis
Liu et al. Dynamic local search based immune automatic clustering algorithm and its applications
Verikas et al. A general framework for designing a fuzzy rule-based classifier
CN106845536A (en) A kind of parallel clustering method based on image scaling
CN110364264A (en) Medical data collection feature dimension reduction method based on sub-space learning
Jiang Spatial structured prediction models: Applications, challenges, and techniques
CN110137951B (en) Market partitioning method and device based on node electricity price
Luo et al. Learning simultaneous adaptive clustering and classification via MOEA
Bakrania et al. Using dimensionality reduction and clustering techniques to classify space plasma regimes
CN112668633A (en) Adaptive graph migration learning method based on fine granularity field
CN117272204A (en) Abnormal data detection method, device, storage medium and electronic equipment
Li et al. IRFAM: Integrated rule-based fuzzy adaptive resonance theory mapping system for watershed modeling
CN111209611A (en) Hyperbolic geometry-based directed network space embedding method
Melnykov et al. Recent developments in model-based clustering with applications
Nanda et al. A correlation based stochastic partitional algorithm for accurate cluster analysis
Chen et al. Operational scenario definition in traffic simulation-based decision support systems: Pattern recognition using a clustering algorithm

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant