CN108416380A - A kind of big data clustering algorithm reducing customer churn risk - Google Patents

A kind of big data clustering algorithm reducing customer churn risk Download PDF

Info

Publication number
CN108416380A
CN108416380A CN201810170341.1A CN201810170341A CN108416380A CN 108416380 A CN108416380 A CN 108416380A CN 201810170341 A CN201810170341 A CN 201810170341A CN 108416380 A CN108416380 A CN 108416380A
Authority
CN
China
Prior art keywords
barycenter
cluster
clustering algorithm
clustering
subtraction
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201810170341.1A
Other languages
Chinese (zh)
Inventor
李果
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Institute of Technology BIT
Original Assignee
Beijing Institute of Technology BIT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Institute of Technology BIT filed Critical Beijing Institute of Technology BIT
Priority to CN201810170341.1A priority Critical patent/CN108416380A/en
Publication of CN108416380A publication Critical patent/CN108416380A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present invention relates to the big data clustering algorithms for reducing customer churn risk, the described method comprises the following steps:(1) relevant attribute is selected using axiom fuzzy set theory, and fuzzy concept is expressed with its membership function and logical operation;(2) according to the degree of membership of calculating, the radius of neighbourhood and weight coefficient of subtraction clustering algorithm are automatically determined;(3) number of clusters and barycenter are calculated by selecting and updating mountain function using subtraction clustering algorithm.Subtraction clustering algorithm and axiom fuzzy set are integrated into the subtractive clustering method of semantics-driven;(4) cluster of the cluster barycenter by the subtractive clustering method acquisition of semantics-driven is calculated using K means algorithms.The subtractive clustering method (SDSCM) of semantics-driven based on subtraction clustering algorithm and axiom fuzzy set, the clustering precision for improving subtraction clustering algorithm and K means reduces the inaccurate risk that operation management is carried out using axiom fuzzy set (AFS) by using this new algorithm.

Description

A kind of big data clustering algorithm reducing customer churn risk
Technical field
The present invention relates to a kind of clustering algorithm, i.e., semantic subtraction clustering algorithm (SDSCM) more particularly to a kind of reduction client The big data clustering algorithm of potential loss risk.
Background technology
Currently, with the aggravation of market competition, customer churn management becomes the important means of enterprise competitive advantage.Base at present Many to the algorithm of customer churn prediction in big data, but all cannot well be predicted customer churn, policymaker is not yet Accurate operational administrative can be carried out by it, lack a kind of reliable big data clustering algorithm for reducing customer churn risk.This Invention provides a kind of new method to help company preferably to reduce customer churn risk, to obtain higher profit.
Invention content
Of the existing technology in order to solve the problems, such as, the present invention discloses a kind of big data cluster reducing customer churn risk Algorithm, the algorithm, by telecommunications big data value maximization, are deduced by effectively excavating the non-structured social data of client One effective big data semanteme subtraction clustering algorithm.Concrete scheme is:
A kind of big data clustering algorithm reducing customer churn risk, the described method comprises the following steps:
(1) relevant attribute is selected using axiom fuzzy set theory, and mould is expressed with its membership function and logical operation Paste concept;
(2) according to the degree of membership of calculating, the radius of neighbourhood and weight coefficient of subtraction clustering algorithm are automatically determined;
(3) number of clusters and barycenter are calculated by selecting and updating mountain function using subtraction clustering algorithm.By subtraction Clustering algorithm and axiom fuzzy set are integrated into the subtractive clustering method of semantics-driven;
(4) cluster of the cluster barycenter by the subtractive clustering method acquisition of semantics-driven is calculated using K-means algorithms.
Further, the clustering algorithm specifically comprises the following steps:
Step 1:The fuzzy concept provided according to user calculates its degree of membership with formula (1).
Step 2:Calculate μη(xi) absolute difference sum:
Step 3:Select minimum value as first cluster barycenter,
Step 4:The Euclidean distance between first cluster barycenter and other data points is calculated, the radius of neighbourhood is to influence The variance of these distances of barycenter range is clustered,
Step 5:In order to avoid obtaining close cluster barycenter, weight coefficient is set, after automatically determining parameter, is used SCM algorithms calculate cluster barycenter,
Step 6:If l=1 simultaneously calculates xiMountain function,
Step 7:Maximum mountain function is selected,
Meanwhile allowing xiAs first barycenter
Step 8:If l=l+1, the mountain function of each data vector is updated according to the following formula,
Step 9:It selects to repeat step 6 until meeting as the second barycenter with the relevant data of larger data:
Wherein ε is a normal number less than 1.When ratio is less than ε, stop iteration,
10th step:Finally export cluster barycenter.
Compared with the prior art, the present invention has the following advantages:
The subtractive clustering method (SDSCM) of semantics-driven based on subtraction clustering algorithm and axiom fuzzy set, improves and subtracts The clustering precision of method clustering algorithm and K-means is reduced by using this new algorithm and is carried out using axiom fuzzy set (AFS) The inaccurate risk of operation management.
Description of the drawings
Fig. 1 is the algorithm flow schematic diagram for the big data clustering algorithm that the present invention reduces customer churn risk.
Specific implementation mode
Below in conjunction with the accompanying drawings to the specific implementation of the big data clustering algorithm disclosed by the invention for reducing customer churn risk Mode elaborates, rather than to limit the scope of the invention.
The present invention relates to following theories:
(1) axiom fuzzy set (Axiomatic Fuzzy Sets, AFS).AFS theories be it is a kind of processing fuzzy message it is new Semantic method, essence be study how the inherent law or pattern that lie in training data or database are transformed into it is fuzzy In collection and its logical operation.Member function and its logical operation determine by initial data and true rather than intuition, mould Imitative human perception and observation things then form concept and generate the mechanism of logic, from more abstract, general level discussion is fuzzy Concept and its logical operation.AFS theoretical includes mainly AFS algebraical sum AFS structure two parts, and AFS algebraically mainly studies concept Logical operation, AFS structures then can provide being subordinate to for fuzzy concept automatically according to the distributed intelligence of data and the semanteme of fuzzy concept Function.
(2) subtraction clustering algorithm (Subtractive Clustering Method, SCM).Subtraction clustering algorithm is a kind of The algorithm of Density Clustering.Subtractive clustering subtracts be completed later using each data point as a potential cluster centre Cluster centre effect, find cluster centre again.We introduce subtraction clustering algorithm and belong to unsupervised learning to calculate Barycenter is clustered, and can quickly determine number of clusters and barycenter quantity based on initial data.
(3) K-means algorithms.K-means is the very typical clustering algorithm based on distance, and similitude is used as using distance Evaluation index, that is, think that the distance of two objects is closer, similarity is bigger.The algorithm thinks that cluster is by apart from close Object composition, therefore handle obtains compact and independent cluster as final goal.Therefore, the present invention is counted with K-means algorithms Calculate cluster.If noticing that the initial parameter value in K-means is incorrect, cluster result may be inaccurate.On the contrary, subtraction is poly- Class algorithm (SCM) can be according to the more accurate input parameter of Raw Data Generation, including cluster barycenter and cluster numbers.Therefore, originally The parameter that subtraction clustering algorithm generates is passed to K-means algorithms by invention, to improve the precision of K-means algorithms.K-means Algorithm since the cluster barycenter of initialization, then data are iteratively distributed to nearest cluster, recalculate the new matter of cluster The heart, until reaching end condition.
After the present invention integrates axiom fuzzy set (AFS) and subtraction clustering algorithm (SCM), new algorithm, i.e. language are formd Adopted subtraction clustering algorithm (SDSCM).Process is as follows:
(1) relevant attribute is selected using axiom fuzzy set (AFS), mould is expressed with its membership function and logical operation Paste concept.
(2) according to the degree of membership of calculating, the radius of neighbourhood and weight coefficient of subtraction clustering algorithm (SCM) are automatically determined.
(3) number of clusters and barycenter are calculated by selecting and updating mountain function using subtraction clustering algorithm.By subtraction Clustering algorithm and axiom fuzzy set are integrated into subtractive clustering method (the Semantic Driven Subtractive of semantics-driven Clustering Method,SDSCM)。
(4) cluster of the cluster barycenter by the subtractive clustering method acquisition of semantics-driven is calculated using K-means algorithms.
The details of SDSCM algorithms is as follows.
The symbol used in algorithm
Step 1:The fuzzy concept provided according to user calculates its degree of membership with formula (1).
2nd step:Calculate μη(xi) absolute difference sum:
Step 3:Select minimum value as first cluster barycenter.
Step 4:Calculate the Euclidean distance between first cluster barycenter and other data points.The radius of neighbourhood is to influence Cluster the variance of these distances of barycenter range.
Step 5:In order to avoid obtaining close cluster barycenter, weight coefficient is set.After automatically determining parameter, use SCM algorithms calculate cluster barycenter.
Step 6:If l=1 simultaneously calculates xiMountain function.
Step 7:Select maximum mountain function.
Meanwhile allowing xiAs first barycenter
Step 8:If l=l+1, the mountain function of each data vector is updated according to the following formula
Step 9:It selects to repeat step 6 until meeting as the second barycenter with the relevant data of larger data:
Wherein ε is a normal number less than 1.When ratio is less than ε, stop iteration.
10th step:Finally export cluster barycenter.
The foregoing is only a preferred embodiment of the present invention, the numerical value and number mentioned in the description of description above Value range is not intended to restrict the invention, and only provides preferred embodiment for the present invention, is not intended to restrict the invention, right For those skilled in the art, the invention may be variously modified and varied.All within the spirits and principles of the present invention, Any modification, equivalent replacement, improvement and so on should all be included in the protection scope of the present invention.

Claims (2)

1. a kind of big data clustering algorithm reducing customer churn risk, the described method comprises the following steps:
(1) relevant attribute is selected using axiom fuzzy set theory, is expressed with its membership function and logical operation fuzzy general It reads;
(2) according to the degree of membership of calculating, the radius of neighbourhood and weight coefficient of subtraction clustering algorithm are automatically determined;
(3) number of clusters and barycenter are calculated by selecting and updating mountain function using subtraction clustering algorithm.By subtractive clustering Algorithm and axiom fuzzy set are integrated into the subtractive clustering method of semantics-driven;
(4) cluster of the cluster barycenter by the subtractive clustering method acquisition of semantics-driven is calculated using K-means algorithms.
2. reducing the big data clustering algorithm of customer churn risk as described in claim 1, which is characterized in that the cluster is calculated Method specifically comprises the following steps:
Step 1:The fuzzy concept provided according to user calculates its degree of membership with formula (1).
Step 2:Calculate μη(xi) absolute difference sum:
Step 3:Select minimum value as first cluster barycenter,
Step 4:The Euclidean distance between first cluster barycenter and other data points is calculated, the radius of neighbourhood is to influence cluster The variance of these distances of barycenter range,
Step 5:In order to avoid obtaining close cluster barycenter, setting weight coefficient is calculated after automatically determining parameter using SCM Method calculates cluster barycenter,
Step 6:If l=1 simultaneously calculates xiMountain function,
Step 7:Maximum mountain function is selected,
Meanwhile allowing xiAs first barycenter
Step 8:If l=l+1, the mountain function of each data vector is updated according to the following formula,
Step 9:It selects to repeat step 6 until meeting as the second barycenter with the relevant data of larger data:
Wherein ε is a normal number less than 1.When ratio is less than ε, stop iteration,
10th step:Finally export cluster barycenter.
CN201810170341.1A 2018-02-28 2018-02-28 A kind of big data clustering algorithm reducing customer churn risk Pending CN108416380A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810170341.1A CN108416380A (en) 2018-02-28 2018-02-28 A kind of big data clustering algorithm reducing customer churn risk

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810170341.1A CN108416380A (en) 2018-02-28 2018-02-28 A kind of big data clustering algorithm reducing customer churn risk

Publications (1)

Publication Number Publication Date
CN108416380A true CN108416380A (en) 2018-08-17

Family

ID=63129655

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810170341.1A Pending CN108416380A (en) 2018-02-28 2018-02-28 A kind of big data clustering algorithm reducing customer churn risk

Country Status (1)

Country Link
CN (1) CN108416380A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109064470A (en) * 2018-08-28 2018-12-21 河南工业大学 A kind of image partition method and device based on adaptive fuzzy clustering
WO2024007580A1 (en) * 2022-07-07 2024-01-11 南京国电南自电网自动化有限公司 Power equipment parallel fault diagnosis method and apparatus based on hybrid clustering

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109064470A (en) * 2018-08-28 2018-12-21 河南工业大学 A kind of image partition method and device based on adaptive fuzzy clustering
CN109064470B (en) * 2018-08-28 2022-02-22 河南工业大学 Image segmentation method and device based on self-adaptive fuzzy clustering
WO2024007580A1 (en) * 2022-07-07 2024-01-11 南京国电南自电网自动化有限公司 Power equipment parallel fault diagnosis method and apparatus based on hybrid clustering

Similar Documents

Publication Publication Date Title
CN110969290B (en) Runoff probability prediction method and system based on deep learning
CN107992976B (en) Hot topic early development trend prediction system and prediction method
WO2021120934A1 (en) Convolutional neural network-based method for automatically grouping drgs
Poczęta et al. Learning fuzzy cognitive maps using structure optimization genetic algorithm
CN109214503B (en) Power transmission and transformation project cost prediction method based on KPCA-LA-RBM
CN109344994A (en) A kind of prediction model method based on improvement moth optimization algorithm
CN109492748B (en) Method for establishing medium-and-long-term load prediction model of power system based on convolutional neural network
CN113283924B (en) Demand prediction method and demand prediction device
CN108710609A (en) A kind of analysis method of social platform user information based on multi-feature fusion
CN112733996A (en) GA-PSO (genetic Algorithm-particle swarm optimization) based hydrological time sequence prediction method for optimizing XGboost
CN108960486B (en) Interactive set evolution method for predicting adaptive value based on gray support vector regression
CN116307215A (en) Load prediction method, device, equipment and storage medium of power system
CN110930030B (en) Doctor skill level rating method
CN108416380A (en) A kind of big data clustering algorithm reducing customer churn risk
CN116187835A (en) Data-driven-based method and system for estimating theoretical line loss interval of transformer area
WO2015145978A1 (en) Energy-amount estimation device, energy-amount estimation method, and recording medium
CN109460608A (en) A method of the high gradient slope deformation prediction based on Fuzzy time sequence
CN116976529A (en) Cross-river-basin water diversion method and system based on supply-demand prediction dynamic correction
CN111882114A (en) Short-term traffic flow prediction model construction method and prediction method
CN111027841A (en) Low-voltage transformer area line loss calculation method based on gradient lifting decision tree
Wang et al. A new time series prediction method based on complex network theory
Sang et al. Ensembles of gradient boosting recurrent neural network for time series data prediction
CN117034116A (en) Machine learning-based traditional village space type identification method
CN109902870A (en) Electric grid investment prediction technique based on AdaBoost regression tree model
CN117194966A (en) Training method and related device for object classification model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20180817

RJ01 Rejection of invention patent application after publication