CN108416380A - A kind of big data clustering algorithm reducing customer churn risk - Google Patents
A kind of big data clustering algorithm reducing customer churn risk Download PDFInfo
- Publication number
- CN108416380A CN108416380A CN201810170341.1A CN201810170341A CN108416380A CN 108416380 A CN108416380 A CN 108416380A CN 201810170341 A CN201810170341 A CN 201810170341A CN 108416380 A CN108416380 A CN 108416380A
- Authority
- CN
- China
- Prior art keywords
- barycenter
- cluster
- clustering algorithm
- clustering
- subtraction
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
Landscapes
- Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Theoretical Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The present invention relates to the big data clustering algorithms for reducing customer churn risk, the described method comprises the following steps:(1) relevant attribute is selected using axiom fuzzy set theory, and fuzzy concept is expressed with its membership function and logical operation;(2) according to the degree of membership of calculating, the radius of neighbourhood and weight coefficient of subtraction clustering algorithm are automatically determined;(3) number of clusters and barycenter are calculated by selecting and updating mountain function using subtraction clustering algorithm.Subtraction clustering algorithm and axiom fuzzy set are integrated into the subtractive clustering method of semantics-driven;(4) cluster of the cluster barycenter by the subtractive clustering method acquisition of semantics-driven is calculated using K means algorithms.The subtractive clustering method (SDSCM) of semantics-driven based on subtraction clustering algorithm and axiom fuzzy set, the clustering precision for improving subtraction clustering algorithm and K means reduces the inaccurate risk that operation management is carried out using axiom fuzzy set (AFS) by using this new algorithm.
Description
Technical field
The present invention relates to a kind of clustering algorithm, i.e., semantic subtraction clustering algorithm (SDSCM) more particularly to a kind of reduction client
The big data clustering algorithm of potential loss risk.
Background technology
Currently, with the aggravation of market competition, customer churn management becomes the important means of enterprise competitive advantage.Base at present
Many to the algorithm of customer churn prediction in big data, but all cannot well be predicted customer churn, policymaker is not yet
Accurate operational administrative can be carried out by it, lack a kind of reliable big data clustering algorithm for reducing customer churn risk.This
Invention provides a kind of new method to help company preferably to reduce customer churn risk, to obtain higher profit.
Invention content
Of the existing technology in order to solve the problems, such as, the present invention discloses a kind of big data cluster reducing customer churn risk
Algorithm, the algorithm, by telecommunications big data value maximization, are deduced by effectively excavating the non-structured social data of client
One effective big data semanteme subtraction clustering algorithm.Concrete scheme is:
A kind of big data clustering algorithm reducing customer churn risk, the described method comprises the following steps:
(1) relevant attribute is selected using axiom fuzzy set theory, and mould is expressed with its membership function and logical operation
Paste concept;
(2) according to the degree of membership of calculating, the radius of neighbourhood and weight coefficient of subtraction clustering algorithm are automatically determined;
(3) number of clusters and barycenter are calculated by selecting and updating mountain function using subtraction clustering algorithm.By subtraction
Clustering algorithm and axiom fuzzy set are integrated into the subtractive clustering method of semantics-driven;
(4) cluster of the cluster barycenter by the subtractive clustering method acquisition of semantics-driven is calculated using K-means algorithms.
Further, the clustering algorithm specifically comprises the following steps:
Step 1:The fuzzy concept provided according to user calculates its degree of membership with formula (1).
Step 2:Calculate μη(xi) absolute difference sum:
Step 3:Select minimum value as first cluster barycenter,
Step 4:The Euclidean distance between first cluster barycenter and other data points is calculated, the radius of neighbourhood is to influence
The variance of these distances of barycenter range is clustered,
Step 5:In order to avoid obtaining close cluster barycenter, weight coefficient is set, after automatically determining parameter, is used
SCM algorithms calculate cluster barycenter,
Step 6:If l=1 simultaneously calculates xiMountain function,
Step 7:Maximum mountain function is selected,
Meanwhile allowing xiAs first barycenter
Step 8:If l=l+1, the mountain function of each data vector is updated according to the following formula,
Step 9:It selects to repeat step 6 until meeting as the second barycenter with the relevant data of larger data:
Wherein ε is a normal number less than 1.When ratio is less than ε, stop iteration,
10th step:Finally export cluster barycenter.
Compared with the prior art, the present invention has the following advantages:
The subtractive clustering method (SDSCM) of semantics-driven based on subtraction clustering algorithm and axiom fuzzy set, improves and subtracts
The clustering precision of method clustering algorithm and K-means is reduced by using this new algorithm and is carried out using axiom fuzzy set (AFS)
The inaccurate risk of operation management.
Description of the drawings
Fig. 1 is the algorithm flow schematic diagram for the big data clustering algorithm that the present invention reduces customer churn risk.
Specific implementation mode
Below in conjunction with the accompanying drawings to the specific implementation of the big data clustering algorithm disclosed by the invention for reducing customer churn risk
Mode elaborates, rather than to limit the scope of the invention.
The present invention relates to following theories:
(1) axiom fuzzy set (Axiomatic Fuzzy Sets, AFS).AFS theories be it is a kind of processing fuzzy message it is new
Semantic method, essence be study how the inherent law or pattern that lie in training data or database are transformed into it is fuzzy
In collection and its logical operation.Member function and its logical operation determine by initial data and true rather than intuition, mould
Imitative human perception and observation things then form concept and generate the mechanism of logic, from more abstract, general level discussion is fuzzy
Concept and its logical operation.AFS theoretical includes mainly AFS algebraical sum AFS structure two parts, and AFS algebraically mainly studies concept
Logical operation, AFS structures then can provide being subordinate to for fuzzy concept automatically according to the distributed intelligence of data and the semanteme of fuzzy concept
Function.
(2) subtraction clustering algorithm (Subtractive Clustering Method, SCM).Subtraction clustering algorithm is a kind of
The algorithm of Density Clustering.Subtractive clustering subtracts be completed later using each data point as a potential cluster centre
Cluster centre effect, find cluster centre again.We introduce subtraction clustering algorithm and belong to unsupervised learning to calculate
Barycenter is clustered, and can quickly determine number of clusters and barycenter quantity based on initial data.
(3) K-means algorithms.K-means is the very typical clustering algorithm based on distance, and similitude is used as using distance
Evaluation index, that is, think that the distance of two objects is closer, similarity is bigger.The algorithm thinks that cluster is by apart from close
Object composition, therefore handle obtains compact and independent cluster as final goal.Therefore, the present invention is counted with K-means algorithms
Calculate cluster.If noticing that the initial parameter value in K-means is incorrect, cluster result may be inaccurate.On the contrary, subtraction is poly-
Class algorithm (SCM) can be according to the more accurate input parameter of Raw Data Generation, including cluster barycenter and cluster numbers.Therefore, originally
The parameter that subtraction clustering algorithm generates is passed to K-means algorithms by invention, to improve the precision of K-means algorithms.K-means
Algorithm since the cluster barycenter of initialization, then data are iteratively distributed to nearest cluster, recalculate the new matter of cluster
The heart, until reaching end condition.
After the present invention integrates axiom fuzzy set (AFS) and subtraction clustering algorithm (SCM), new algorithm, i.e. language are formd
Adopted subtraction clustering algorithm (SDSCM).Process is as follows:
(1) relevant attribute is selected using axiom fuzzy set (AFS), mould is expressed with its membership function and logical operation
Paste concept.
(2) according to the degree of membership of calculating, the radius of neighbourhood and weight coefficient of subtraction clustering algorithm (SCM) are automatically determined.
(3) number of clusters and barycenter are calculated by selecting and updating mountain function using subtraction clustering algorithm.By subtraction
Clustering algorithm and axiom fuzzy set are integrated into subtractive clustering method (the Semantic Driven Subtractive of semantics-driven
Clustering Method,SDSCM)。
(4) cluster of the cluster barycenter by the subtractive clustering method acquisition of semantics-driven is calculated using K-means algorithms.
The details of SDSCM algorithms is as follows.
The symbol used in algorithm
Step 1:The fuzzy concept provided according to user calculates its degree of membership with formula (1).
2nd step:Calculate μη(xi) absolute difference sum:
Step 3:Select minimum value as first cluster barycenter.
Step 4:Calculate the Euclidean distance between first cluster barycenter and other data points.The radius of neighbourhood is to influence
Cluster the variance of these distances of barycenter range.
Step 5:In order to avoid obtaining close cluster barycenter, weight coefficient is set.After automatically determining parameter, use
SCM algorithms calculate cluster barycenter.
Step 6:If l=1 simultaneously calculates xiMountain function.
Step 7:Select maximum mountain function.
Meanwhile allowing xiAs first barycenter
Step 8:If l=l+1, the mountain function of each data vector is updated according to the following formula
Step 9:It selects to repeat step 6 until meeting as the second barycenter with the relevant data of larger data:
Wherein ε is a normal number less than 1.When ratio is less than ε, stop iteration.
10th step:Finally export cluster barycenter.
The foregoing is only a preferred embodiment of the present invention, the numerical value and number mentioned in the description of description above
Value range is not intended to restrict the invention, and only provides preferred embodiment for the present invention, is not intended to restrict the invention, right
For those skilled in the art, the invention may be variously modified and varied.All within the spirits and principles of the present invention,
Any modification, equivalent replacement, improvement and so on should all be included in the protection scope of the present invention.
Claims (2)
1. a kind of big data clustering algorithm reducing customer churn risk, the described method comprises the following steps:
(1) relevant attribute is selected using axiom fuzzy set theory, is expressed with its membership function and logical operation fuzzy general
It reads;
(2) according to the degree of membership of calculating, the radius of neighbourhood and weight coefficient of subtraction clustering algorithm are automatically determined;
(3) number of clusters and barycenter are calculated by selecting and updating mountain function using subtraction clustering algorithm.By subtractive clustering
Algorithm and axiom fuzzy set are integrated into the subtractive clustering method of semantics-driven;
(4) cluster of the cluster barycenter by the subtractive clustering method acquisition of semantics-driven is calculated using K-means algorithms.
2. reducing the big data clustering algorithm of customer churn risk as described in claim 1, which is characterized in that the cluster is calculated
Method specifically comprises the following steps:
Step 1:The fuzzy concept provided according to user calculates its degree of membership with formula (1).
Step 2:Calculate μη(xi) absolute difference sum:
Step 3:Select minimum value as first cluster barycenter,
Step 4:The Euclidean distance between first cluster barycenter and other data points is calculated, the radius of neighbourhood is to influence cluster
The variance of these distances of barycenter range,
Step 5:In order to avoid obtaining close cluster barycenter, setting weight coefficient is calculated after automatically determining parameter using SCM
Method calculates cluster barycenter,
Step 6:If l=1 simultaneously calculates xiMountain function,
Step 7:Maximum mountain function is selected,
Meanwhile allowing xiAs first barycenter
Step 8:If l=l+1, the mountain function of each data vector is updated according to the following formula,
Step 9:It selects to repeat step 6 until meeting as the second barycenter with the relevant data of larger data:
Wherein ε is a normal number less than 1.When ratio is less than ε, stop iteration,
10th step:Finally export cluster barycenter.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810170341.1A CN108416380A (en) | 2018-02-28 | 2018-02-28 | A kind of big data clustering algorithm reducing customer churn risk |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810170341.1A CN108416380A (en) | 2018-02-28 | 2018-02-28 | A kind of big data clustering algorithm reducing customer churn risk |
Publications (1)
Publication Number | Publication Date |
---|---|
CN108416380A true CN108416380A (en) | 2018-08-17 |
Family
ID=63129655
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810170341.1A Pending CN108416380A (en) | 2018-02-28 | 2018-02-28 | A kind of big data clustering algorithm reducing customer churn risk |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108416380A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109064470A (en) * | 2018-08-28 | 2018-12-21 | 河南工业大学 | A kind of image partition method and device based on adaptive fuzzy clustering |
WO2024007580A1 (en) * | 2022-07-07 | 2024-01-11 | 南京国电南自电网自动化有限公司 | Power equipment parallel fault diagnosis method and apparatus based on hybrid clustering |
-
2018
- 2018-02-28 CN CN201810170341.1A patent/CN108416380A/en active Pending
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109064470A (en) * | 2018-08-28 | 2018-12-21 | 河南工业大学 | A kind of image partition method and device based on adaptive fuzzy clustering |
CN109064470B (en) * | 2018-08-28 | 2022-02-22 | 河南工业大学 | Image segmentation method and device based on self-adaptive fuzzy clustering |
WO2024007580A1 (en) * | 2022-07-07 | 2024-01-11 | 南京国电南自电网自动化有限公司 | Power equipment parallel fault diagnosis method and apparatus based on hybrid clustering |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110969290B (en) | Runoff probability prediction method and system based on deep learning | |
CN107992976B (en) | Hot topic early development trend prediction system and prediction method | |
WO2021120934A1 (en) | Convolutional neural network-based method for automatically grouping drgs | |
Poczęta et al. | Learning fuzzy cognitive maps using structure optimization genetic algorithm | |
CN109214503B (en) | Power transmission and transformation project cost prediction method based on KPCA-LA-RBM | |
CN109344994A (en) | A kind of prediction model method based on improvement moth optimization algorithm | |
CN109492748B (en) | Method for establishing medium-and-long-term load prediction model of power system based on convolutional neural network | |
CN113283924B (en) | Demand prediction method and demand prediction device | |
CN108710609A (en) | A kind of analysis method of social platform user information based on multi-feature fusion | |
CN112733996A (en) | GA-PSO (genetic Algorithm-particle swarm optimization) based hydrological time sequence prediction method for optimizing XGboost | |
CN108960486B (en) | Interactive set evolution method for predicting adaptive value based on gray support vector regression | |
CN116307215A (en) | Load prediction method, device, equipment and storage medium of power system | |
CN110930030B (en) | Doctor skill level rating method | |
CN108416380A (en) | A kind of big data clustering algorithm reducing customer churn risk | |
CN116187835A (en) | Data-driven-based method and system for estimating theoretical line loss interval of transformer area | |
WO2015145978A1 (en) | Energy-amount estimation device, energy-amount estimation method, and recording medium | |
CN109460608A (en) | A method of the high gradient slope deformation prediction based on Fuzzy time sequence | |
CN116976529A (en) | Cross-river-basin water diversion method and system based on supply-demand prediction dynamic correction | |
CN111882114A (en) | Short-term traffic flow prediction model construction method and prediction method | |
CN111027841A (en) | Low-voltage transformer area line loss calculation method based on gradient lifting decision tree | |
Wang et al. | A new time series prediction method based on complex network theory | |
Sang et al. | Ensembles of gradient boosting recurrent neural network for time series data prediction | |
CN117034116A (en) | Machine learning-based traditional village space type identification method | |
CN109902870A (en) | Electric grid investment prediction technique based on AdaBoost regression tree model | |
CN117194966A (en) | Training method and related device for object classification model |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20180817 |
|
RJ01 | Rejection of invention patent application after publication |