CN108416380A

CN108416380A - A kind of big data clustering algorithm reducing customer churn risk

Info

Publication number: CN108416380A
Application number: CN201810170341.1A
Authority: CN
Inventors: 李果
Original assignee: Beijing Institute of Technology BIT
Current assignee: Beijing Institute of Technology BIT
Priority date: 2018-02-28
Filing date: 2018-02-28
Publication date: 2018-08-17

Abstract

The present invention relates to the big data clustering algorithms for reducing customer churn risk, the described method comprises the following steps：(1) relevant attribute is selected using axiom fuzzy set theory, and fuzzy concept is expressed with its membership function and logical operation；(2) according to the degree of membership of calculating, the radius of neighbourhood and weight coefficient of subtraction clustering algorithm are automatically determined；(3) number of clusters and barycenter are calculated by selecting and updating mountain function using subtraction clustering algorithm.Subtraction clustering algorithm and axiom fuzzy set are integrated into the subtractive clustering method of semantics-driven；(4) cluster of the cluster barycenter by the subtractive clustering method acquisition of semantics-driven is calculated using K means algorithms.The subtractive clustering method (SDSCM) of semantics-driven based on subtraction clustering algorithm and axiom fuzzy set, the clustering precision for improving subtraction clustering algorithm and K means reduces the inaccurate risk that operation management is carried out using axiom fuzzy set (AFS) by using this new algorithm.

Description

A kind of big data clustering algorithm reducing customer churn risk

Technical field

The present invention relates to a kind of clustering algorithm, i.e., semantic subtraction clustering algorithm (SDSCM) more particularly to a kind of reduction client The big data clustering algorithm of potential loss risk.

Background technology

Currently, with the aggravation of market competition, customer churn management becomes the important means of enterprise competitive advantage.Base at present Many to the algorithm of customer churn prediction in big data, but all cannot well be predicted customer churn, policymaker is not yet Accurate operational administrative can be carried out by it, lack a kind of reliable big data clustering algorithm for reducing customer churn risk.This Invention provides a kind of new method to help company preferably to reduce customer churn risk, to obtain higher profit.

Invention content

Of the existing technology in order to solve the problems, such as, the present invention discloses a kind of big data cluster reducing customer churn risk Algorithm, the algorithm, by telecommunications big data value maximization, are deduced by effectively excavating the non-structured social data of client One effective big data semanteme subtraction clustering algorithm.Concrete scheme is：

A kind of big data clustering algorithm reducing customer churn risk, the described method comprises the following steps：

(1) relevant attribute is selected using axiom fuzzy set theory, and mould is expressed with its membership function and logical operation Paste concept；

(2) according to the degree of membership of calculating, the radius of neighbourhood and weight coefficient of subtraction clustering algorithm are automatically determined；

(3) number of clusters and barycenter are calculated by selecting and updating mountain function using subtraction clustering algorithm.By subtraction Clustering algorithm and axiom fuzzy set are integrated into the subtractive clustering method of semantics-driven；

(4) cluster of the cluster barycenter by the subtractive clustering method acquisition of semantics-driven is calculated using K-means algorithms.

Further, the clustering algorithm specifically comprises the following steps：

Step 1：The fuzzy concept provided according to user calculates its degree of membership with formula (1).

Step 2：Calculate μ_η(x_i) absolute difference sum:

Step 3：Select minimum value as first cluster barycenter,

Step 4：The Euclidean distance between first cluster barycenter and other data points is calculated, the radius of neighbourhood is to influence The variance of these distances of barycenter range is clustered,

Step 5：In order to avoid obtaining close cluster barycenter, weight coefficient is set, after automatically determining parameter, is used SCM algorithms calculate cluster barycenter,

Step 6：If l=1 simultaneously calculates x_iMountain function,

Step 7：Maximum mountain function is selected,

Meanwhile allowing x_iAs first barycenter

Step 8：If l=l+1, the mountain function of each data vector is updated according to the following formula,

Step 9：It selects to repeat step 6 until meeting as the second barycenter with the relevant data of larger data:

Wherein ε is a normal number less than 1.When ratio is less than ε, stop iteration,

10th step：Finally export cluster barycenter.

Compared with the prior art, the present invention has the following advantages：

The subtractive clustering method (SDSCM) of semantics-driven based on subtraction clustering algorithm and axiom fuzzy set, improves and subtracts The clustering precision of method clustering algorithm and K-means is reduced by using this new algorithm and is carried out using axiom fuzzy set (AFS) The inaccurate risk of operation management.

Description of the drawings

Fig. 1 is the algorithm flow schematic diagram for the big data clustering algorithm that the present invention reduces customer churn risk.

Specific implementation mode

Below in conjunction with the accompanying drawings to the specific implementation of the big data clustering algorithm disclosed by the invention for reducing customer churn risk Mode elaborates, rather than to limit the scope of the invention.

The present invention relates to following theories：

(1) axiom fuzzy set (Axiomatic Fuzzy Sets, AFS).AFS theories be it is a kind of processing fuzzy message it is new Semantic method, essence be study how the inherent law or pattern that lie in training data or database are transformed into it is fuzzy In collection and its logical operation.Member function and its logical operation determine by initial data and true rather than intuition, mould Imitative human perception and observation things then form concept and generate the mechanism of logic, from more abstract, general level discussion is fuzzy Concept and its logical operation.AFS theoretical includes mainly AFS algebraical sum AFS structure two parts, and AFS algebraically mainly studies concept Logical operation, AFS structures then can provide being subordinate to for fuzzy concept automatically according to the distributed intelligence of data and the semanteme of fuzzy concept Function.

(2) subtraction clustering algorithm (Subtractive Clustering Method, SCM).Subtraction clustering algorithm is a kind of The algorithm of Density Clustering.Subtractive clustering subtracts be completed later using each data point as a potential cluster centre Cluster centre effect, find cluster centre again.We introduce subtraction clustering algorithm and belong to unsupervised learning to calculate Barycenter is clustered, and can quickly determine number of clusters and barycenter quantity based on initial data.

(3) K-means algorithms.K-means is the very typical clustering algorithm based on distance, and similitude is used as using distance Evaluation index, that is, think that the distance of two objects is closer, similarity is bigger.The algorithm thinks that cluster is by apart from close Object composition, therefore handle obtains compact and independent cluster as final goal.Therefore, the present invention is counted with K-means algorithms Calculate cluster.If noticing that the initial parameter value in K-means is incorrect, cluster result may be inaccurate.On the contrary, subtraction is poly- Class algorithm (SCM) can be according to the more accurate input parameter of Raw Data Generation, including cluster barycenter and cluster numbers.Therefore, originally The parameter that subtraction clustering algorithm generates is passed to K-means algorithms by invention, to improve the precision of K-means algorithms.K-means Algorithm since the cluster barycenter of initialization, then data are iteratively distributed to nearest cluster, recalculate the new matter of cluster The heart, until reaching end condition.

After the present invention integrates axiom fuzzy set (AFS) and subtraction clustering algorithm (SCM), new algorithm, i.e. language are formd Adopted subtraction clustering algorithm (SDSCM).Process is as follows：

(1) relevant attribute is selected using axiom fuzzy set (AFS), mould is expressed with its membership function and logical operation Paste concept.

(2) according to the degree of membership of calculating, the radius of neighbourhood and weight coefficient of subtraction clustering algorithm (SCM) are automatically determined.

(3) number of clusters and barycenter are calculated by selecting and updating mountain function using subtraction clustering algorithm.By subtraction Clustering algorithm and axiom fuzzy set are integrated into subtractive clustering method (the Semantic Driven Subtractive of semantics-driven Clustering Method,SDSCM)。

The details of SDSCM algorithms is as follows.

The symbol used in algorithm

2nd step：Calculate μ_η(x_i) absolute difference sum:

Step 3：Select minimum value as first cluster barycenter.

Step 4：Calculate the Euclidean distance between first cluster barycenter and other data points.The radius of neighbourhood is to influence Cluster the variance of these distances of barycenter range.

Step 5：In order to avoid obtaining close cluster barycenter, weight coefficient is set.After automatically determining parameter, use SCM algorithms calculate cluster barycenter.

Step 6：If l=1 simultaneously calculates x_iMountain function.

Step 7：Select maximum mountain function.

Meanwhile allowing x_iAs first barycenter

Step 8：If l=l+1, the mountain function of each data vector is updated according to the following formula

Wherein ε is a normal number less than 1.When ratio is less than ε, stop iteration.

10th step：Finally export cluster barycenter.

The foregoing is only a preferred embodiment of the present invention, the numerical value and number mentioned in the description of description above Value range is not intended to restrict the invention, and only provides preferred embodiment for the present invention, is not intended to restrict the invention, right For those skilled in the art, the invention may be variously modified and varied.All within the spirits and principles of the present invention, Any modification, equivalent replacement, improvement and so on should all be included in the protection scope of the present invention.

Claims

1. a kind of big data clustering algorithm reducing customer churn risk, the described method comprises the following steps：

(1) relevant attribute is selected using axiom fuzzy set theory, is expressed with its membership function and logical operation fuzzy general It reads；

(3) number of clusters and barycenter are calculated by selecting and updating mountain function using subtraction clustering algorithm.By subtractive clustering Algorithm and axiom fuzzy set are integrated into the subtractive clustering method of semantics-driven；

2. reducing the big data clustering algorithm of customer churn risk as described in claim 1, which is characterized in that the cluster is calculated Method specifically comprises the following steps：

Step 2：Calculate μ_η(x_i) absolute difference sum:

Step 3：Select minimum value as first cluster barycenter,

Step 4：The Euclidean distance between first cluster barycenter and other data points is calculated, the radius of neighbourhood is to influence cluster The variance of these distances of barycenter range,

Step 5：In order to avoid obtaining close cluster barycenter, setting weight coefficient is calculated after automatically determining parameter using SCM Method calculates cluster barycenter,

Step 6：If l=1 simultaneously calculates x_iMountain function,

Step 7：Maximum mountain function is selected,

Meanwhile allowing x_iAs first barycenter

10th step：Finally export cluster barycenter.