CN113034264A - Method and device for establishing customer loss early warning model, terminal equipment and medium - Google Patents
Method and device for establishing customer loss early warning model, terminal equipment and medium Download PDFInfo
- Publication number
- CN113034264A CN113034264A CN202010920139.3A CN202010920139A CN113034264A CN 113034264 A CN113034264 A CN 113034264A CN 202010920139 A CN202010920139 A CN 202010920139A CN 113034264 A CN113034264 A CN 113034264A
- Authority
- CN
- China
- Prior art keywords
- decision tree
- sample set
- training sample
- attributes
- attribute
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 61
- 238000003066 decision tree Methods 0.000 claims abstract description 92
- 238000012549 training Methods 0.000 claims abstract description 83
- 238000003062 neural network model Methods 0.000 claims abstract description 26
- 238000013138 pruning Methods 0.000 claims abstract description 13
- 238000004590 computer program Methods 0.000 claims description 21
- 238000004422 calculation algorithm Methods 0.000 claims description 18
- 238000013528 artificial neural network Methods 0.000 claims description 13
- 238000003860 storage Methods 0.000 claims description 9
- 210000002569 neuron Anatomy 0.000 claims description 7
- 238000010801 machine learning Methods 0.000 abstract description 4
- 230000008569 process Effects 0.000 description 18
- 230000006870 function Effects 0.000 description 9
- 238000004364 calculation method Methods 0.000 description 8
- 238000010586 diagram Methods 0.000 description 8
- 230000008901 benefit Effects 0.000 description 4
- 230000004044 response Effects 0.000 description 4
- 238000013145 classification model Methods 0.000 description 3
- 230000008878 coupling Effects 0.000 description 3
- 238000010168 coupling process Methods 0.000 description 3
- 238000005859 coupling reaction Methods 0.000 description 3
- 238000009826 distribution Methods 0.000 description 3
- 238000012545 processing Methods 0.000 description 3
- 230000008859 change Effects 0.000 description 2
- 238000004140 cleaning Methods 0.000 description 2
- 238000004891 communication Methods 0.000 description 2
- 239000002131 composite material Substances 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 230000010354 integration Effects 0.000 description 2
- 238000007781 pre-processing Methods 0.000 description 2
- 238000002360 preparation method Methods 0.000 description 2
- 238000012216 screening Methods 0.000 description 2
- 239000013598 vector Substances 0.000 description 2
- 238000012795 verification Methods 0.000 description 2
- 230000002159 abnormal effect Effects 0.000 description 1
- 230000004913 activation Effects 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 230000006399 behavior Effects 0.000 description 1
- 238000005520 cutting process Methods 0.000 description 1
- 238000007405 data analysis Methods 0.000 description 1
- 238000007418 data mining Methods 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 230000004069 differentiation Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 230000005484 gravity Effects 0.000 description 1
- 238000009499 grossing Methods 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 230000008520 organization Effects 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
- 238000010200 validation analysis Methods 0.000 description 1
Images
Classifications
-
- E—FIXED CONSTRUCTIONS
- E06—DOORS, WINDOWS, SHUTTERS, OR ROLLER BLINDS IN GENERAL; LADDERS
- E06B—FIXED OR MOVABLE CLOSURES FOR OPENINGS IN BUILDINGS, VEHICLES, FENCES OR LIKE ENCLOSURES IN GENERAL, e.g. DOORS, WINDOWS, BLINDS, GATES
- E06B3/00—Window sashes, door leaves, or like elements for closing wall or like openings; Layout of fixed or moving closures, e.g. windows in wall or like openings; Features of rigidly-mounted outer frames relating to the mounting of wing frames
- E06B3/32—Arrangements of wings characterised by the manner of movement; Arrangements of movable wings in openings; Features of wings or frames relating solely to the manner of movement of the wing
- E06B3/34—Arrangements of wings characterised by the manner of movement; Arrangements of movable wings in openings; Features of wings or frames relating solely to the manner of movement of the wing with only one kind of movement
- E06B3/36—Arrangements of wings characterised by the manner of movement; Arrangements of movable wings in openings; Features of wings or frames relating solely to the manner of movement of the wing with only one kind of movement with a single vertical axis of rotation at one side of the opening, or swinging through the opening
- E06B3/367—Arrangements of wings characterised by the manner of movement; Arrangements of movable wings in openings; Features of wings or frames relating solely to the manner of movement of the wing with only one kind of movement with a single vertical axis of rotation at one side of the opening, or swinging through the opening specially adapted for furniture
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q40/00—Finance; Insurance; Tax strategies; Processing of corporate or income taxes
- G06Q40/02—Banking, e.g. interest calculation or account maintenance
-
- A—HUMAN NECESSITIES
- A47—FURNITURE; DOMESTIC ARTICLES OR APPLIANCES; COFFEE MILLS; SPICE MILLS; SUCTION CLEANERS IN GENERAL
- A47F—SPECIAL FURNITURE, FITTINGS, OR ACCESSORIES FOR SHOPS, STOREHOUSES, BARS, RESTAURANTS OR THE LIKE; PAYING COUNTERS
- A47F3/00—Show cases or show cabinets
- A47F3/04—Show cases or show cabinets air-conditioned, refrigerated
- A47F3/0404—Cases or cabinets of the closed type
- A47F3/0426—Details
- A47F3/0434—Glass or transparent panels
-
- E—FIXED CONSTRUCTIONS
- E06—DOORS, WINDOWS, SHUTTERS, OR ROLLER BLINDS IN GENERAL; LADDERS
- E06B—FIXED OR MOVABLE CLOSURES FOR OPENINGS IN BUILDINGS, VEHICLES, FENCES OR LIKE ENCLOSURES IN GENERAL, e.g. DOORS, WINDOWS, BLINDS, GATES
- E06B1/00—Border constructions of openings in walls, floors, or ceilings; Frames to be rigidly mounted in such openings
- E06B1/04—Frames for doors, windows, or the like to be fixed in openings
- E06B1/12—Metal frames
- E06B1/18—Metal frames composed of several parts with respect to the cross-section of the frame itself
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/082—Learning methods modifying the architecture, e.g. adding, deleting or silencing nodes or connections
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/02—Marketing; Price estimation or determination; Fundraising
- G06Q30/0201—Market modelling; Market analysis; Collecting market data
- G06Q30/0203—Market surveys; Market polls
Landscapes
- Engineering & Computer Science (AREA)
- Business, Economics & Management (AREA)
- Finance (AREA)
- Physics & Mathematics (AREA)
- Accounting & Taxation (AREA)
- Theoretical Computer Science (AREA)
- Strategic Management (AREA)
- Development Economics (AREA)
- General Physics & Mathematics (AREA)
- General Business, Economics & Management (AREA)
- Entrepreneurship & Innovation (AREA)
- Data Mining & Analysis (AREA)
- Economics (AREA)
- Marketing (AREA)
- Civil Engineering (AREA)
- Structural Engineering (AREA)
- Software Systems (AREA)
- Biophysics (AREA)
- General Engineering & Computer Science (AREA)
- Computing Systems (AREA)
- Molecular Biology (AREA)
- General Health & Medical Sciences (AREA)
- Evolutionary Computation (AREA)
- Computational Linguistics (AREA)
- Game Theory and Decision Science (AREA)
- Mathematical Physics (AREA)
- Biomedical Technology (AREA)
- Artificial Intelligence (AREA)
- Technology Law (AREA)
- Life Sciences & Earth Sciences (AREA)
- Health & Medical Sciences (AREA)
- Thermal Sciences (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The application is applicable to the technical field of machine learning, and provides a method, a device, terminal equipment and a medium for establishing a customer loss early warning model, wherein the method comprises the following steps: acquiring a training sample set, wherein the training sample set comprises a plurality of customer data; extracting a plurality of attributes from the training sample set, and establishing a decision tree according to the plurality of attributes and the training sample set; pruning the decision tree according to the weights of the attributes, and taking the attributes in the pruned decision tree as the influence indexes of the customer loss; and training a preset neural network model according to the influence indexes and the trimmed decision tree to obtain the client loss early warning model. The client loss early warning model established by the method has high prejudgment accuracy and high robustness.
Description
Technical Field
The application belongs to the technical field of machine learning, and particularly relates to a method and a device for establishing a client loss early warning model, terminal equipment and a medium.
Background
Customers are very important resources for financial institutions. Taking bank as an example, with the advent of internet banking, mobile banking and a large number of internet financial products, the regional differentiation of financial services is gradually reduced, the selection of financial services and financial products by customers is increasingly diversified and free, and the dependence and loyalty of customers on a single bank organization are also reduced. How to accurately predict the loss risk of the customer is very important for banks.
At present, when the risk of customer loss is predicted, a Back Propagation (BP) neural network algorithm is generally adopted, the measuring and calculating accuracy is low, and the algorithm is easy to cause poor stability due to over-training.
Disclosure of Invention
The embodiment of the application provides a method, a device, a terminal device and a medium for establishing a client loss early warning model, which can improve the accuracy of judging the client loss early warning model and enhance the robustness of the client loss early warning model.
In a first aspect, an embodiment of the present application provides a method for building a customer churn early-warning model, including:
acquiring a training sample set, wherein the training sample set comprises a plurality of customer data;
extracting a plurality of attributes from the training sample set, and establishing a decision tree according to the plurality of attributes and the training sample set;
pruning the decision tree according to the weights of the attributes, and taking the attributes in the pruned decision tree as the influence indexes of the customer loss;
and training a preset neural network model according to the influence indexes and the trimmed decision tree to obtain the client loss early warning model.
In a second aspect, an embodiment of the present application provides an apparatus for building a customer churn early warning model, including:
the training sample set acquisition module is used for acquiring a training sample set, and the training sample set comprises a plurality of client data;
the decision tree establishing module is used for extracting a plurality of attributes from the training sample set and establishing a decision tree according to the attributes and the training sample set;
an influence index determining module, configured to prune the decision tree according to the weights of the multiple attributes, and use the attributes in the pruned decision tree as influence indexes of customer churn;
and the client loss early warning model determining module is used for training a preset neural network model according to the influence indexes and the trimmed decision tree to obtain the client loss early warning model.
In a third aspect, an embodiment of the present application provides a terminal device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and the processor, when executing the computer program, implements the method according to the first aspect.
In a fourth aspect, the present application provides a computer-readable storage medium, which stores a computer program, and when the computer program is executed by a processor, the computer program implements the method according to the first aspect.
In a fifth aspect, the present application provides a computer program product, which when run on a terminal device, causes the terminal device to execute the method of any one of the above first aspects.
Compared with the prior art, the embodiment of the application has the advantages that: in the embodiment of the application, data can be processed in advance to obtain a training sample set; then acquiring a plurality of attributes from the training sample set, and establishing a decision tree based on the training sample set and the attributes; pruning the decision tree, and taking the residual attributes in the pruned decision tree as the influence indexes of the customer loss; and training a preset neural network model according to the influence indexes and the trimmed decision tree to obtain a client loss early warning model. In the application, after the influence indexes influencing the loss of the client are screened out by adopting the decision tree, the neural network model is utilized for machine learning, and the early warning model of the loss of the client is obtained. The method of combining the two algorithms avoids the limitation of a single model, ensures the training speed and quality of the client loss early warning model, and improves the judgment accuracy and robustness of the client loss early warning model.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the embodiments or the prior art descriptions will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings without creative efforts.
Fig. 1 is a schematic flowchart illustrating a method for building a customer churn early warning model according to an embodiment of the present disclosure;
FIG. 2 is a schematic flow chart of data preprocessing according to an embodiment of the present disclosure;
FIG. 3 is a schematic diagram of a process for building a decision tree according to an embodiment of the present disclosure;
FIG. 4 is a schematic diagram of a neural network algorithm training process provided in an embodiment of the present application;
fig. 5 is a schematic flowchart of a method for building a customer churn early warning model according to a second embodiment of the present application;
fig. 6 is a schematic structural diagram of an apparatus for building a customer churn early-warning model according to a third embodiment of the present application;
fig. 7 is a schematic structural diagram of a terminal device according to a fourth embodiment of the present application.
Detailed Description
In the following description, for purposes of explanation and not limitation, specific details are set forth, such as particular system structures, techniques, etc. in order to provide a thorough understanding of the embodiments of the present application. However, it will be apparent to one skilled in the art that the present application may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present application with unnecessary detail.
It will be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
It should also be understood that the term "and/or" as used in this specification and the appended claims refers to and includes any and all possible combinations of one or more of the associated listed items.
As used in this specification and the appended claims, the term "if" may be interpreted contextually as "when", "upon" or "in response to" determining "or" in response to detecting ". Similarly, the phrase "if it is determined" or "if a [ described condition or event ] is detected" may be interpreted contextually to mean "upon determining" or "in response to determining" or "upon detecting [ described condition or event ]" or "in response to detecting [ described condition or event ]".
Furthermore, in the description of the present application and the appended claims, the terms "first," "second," "third," and the like are used for distinguishing between descriptions and not necessarily for describing or implying relative importance.
Reference throughout this specification to "one embodiment" or "some embodiments," or the like, means that a particular feature, structure, or characteristic described in connection with the embodiment is included in one or more embodiments of the present application. Thus, appearances of the phrases "in one embodiment," "in some embodiments," "in other embodiments," or the like, in various places throughout this specification are not necessarily all referring to the same embodiment, but rather "one or more but not all embodiments" unless specifically stated otherwise. The terms "comprising," "including," "having," and variations thereof mean "including, but not limited to," unless expressly specified otherwise.
Fig. 1 is a schematic flowchart of a method for building a customer churn early-warning model according to an embodiment of the present application, and as shown in fig. 1, the method includes:
s101, obtaining a training sample set, wherein the training sample set comprises a plurality of customer data;
the execution subject of the embodiment is a terminal device, and specifically may be a computer or other computing device capable of performing data processing and machine learning.
The customer data can be obtained from a bank database, and the loss of the bank customer can be defined according to the bank characteristics and the customer dynamics, for example, when no financial APP is logged in or no account transfer operation is performed within a preset time period, the risk of the customer loss is considered to be high. Specifically, fig. 2 is a schematic flow chart of data preprocessing provided in an embodiment of the present application, and referring to fig. 2, a manager may collect a large amount of customer data in a bank database in advance, as a data set, perform data cleaning, data integration, and data selection, and finally input the obtained data into a model as a training sample set. Wherein, the data cleaning comprises filling missing data values, smoothing noisy data, identifying or removing abnormal values, and the like; data integration refers to merging data from multiple data sources together and storing the merged data in a consistent data storage; and in the data selection, influence indexes of bank customer loss are selected as input variables, and finally modeling data are determined.
In addition, in the model training process, a training sample set and a verification sample set are required, so that the processed data can be expressed according to the following formula 2: a scale of 1 divides the training sample set and the validation sample set.
S102, extracting a plurality of attributes from the training sample set, and establishing a decision tree according to the attributes and the training sample set;
specifically, the plurality of attributes may include, but are not limited to, the number of financial products owned by an individual, the number of electronic banking transactions, the monthly deposit average, financial account changes, financial transaction changes, the nearly March loan balance, the number of self-service device transactions, the number of monthly counter services, the number of interview client manager revisits, the number of bank APP accesses, the number of bank APP function clicks, and the like.
The decision tree model is a relatively intuitive probability model for analyzing uncertain events, can classify and predict objects, and belongs to a method commonly used in the data mining technology. The operation mechanism of the decision tree is recursion from top to bottom, attributes are compared and identified in nodes inside the decision tree, branches of the next node are judged according to different attribute types, and final comparison is finally completed on decision tree leaf nodes to obtain results. The decision tree has the advantages of easy understanding and realization, simple data preparation, short running time, capability of an analyst to observe the running of the model in the whole process and the like. In this embodiment, a decision tree is used to classify data, so as to screen out an index system that affects customer loss.
Fig. 3 is a schematic diagram of a process for building a decision tree according to an embodiment of the present application, and referring to fig. 3, the process for building a decision tree may include: through feature selection, node splitting and decision tree pruning, the training sample set is classified, and an index system influencing customer loss is output.
Specifically, in this embodiment, a C5.0 decision tree algorithm may be used to establish the decision tree. And after the sample data is determined, starting a decision tree algorithm to screen indexes. Firstly, the data set of the embodiment is an unbalanced data set, and in order to reduce the proportional difference between the lost customer and the non-lost customer and improve the model accuracy, the number of most samples is reduced by a random under-sampling method according to the following steps: 1, dividing a training sample set and a verification sample set in a proportion.
And calculating the gain ratio of each node of the bank customer loss influence index according to the characteristic that the C5.0 decision tree adopts the gain ratio as the splitting attribute of the current node. Let S be individual customer data in a certain time span of a bankComprises s samples, and comprises m different classes xi(i ═ 1,2, … m), each class xiNumber of samples of viD is a certain influence index of the training sample set S and has k different values { a1,a2,…,akDividing the data set S into k different subsets S according to the value attributes DjDenotes the ith subset (j ═ 1,2, …, k), | SijI denotes siAttribute x in the subsetiNumber of samples of, then attribute D is for class xiThe entropy of (i ═ 1,2, …, m) is expressed as:
order toThen W isjIs a subset sjThe specific gravity in the data set S gives the same weight to the bank customer churn influence index input into the decision tree.
The information Gain (S, D) of the influence index D is calculated as:
Gain(S,D)=I(s1,s2,…,sm)-E(S,D) (2)
wherein:is the entropy of the sample set S, p (x)i) Indicates the probability of each class occurrence ande (S, D) represents a weighted sum of the entropies of the k subsets divided by the influence index D. This node split information item is represented as:
equation (2) is the entropy of the data set S with respect to the influence index D, and the more evenly the value distribution of the samples on the influence index D, the larger the value of the split information item. The gain ratio is expressed as:
and selecting the influence index with the maximum gain ratio for splitting, and determining an optimal splitting point after the optimal splitting index is determined. The above process is repeated, and the samples are continuously grouped until each branch of the whole decision tree continues to be grouped and no longer makes sense, the grouping is stopped, and a complete decision tree is generated.
S103, pruning the decision tree according to the weights of the attributes, and taking the attributes in the pruned decision tree as the influence indexes of the customer loss;
iterating the decision tree according to a Boosting algorithm, and giving new weight values to each index attribute, wherein the method specifically comprises the following steps: let T be the iteration number of the decision tree, carry on T training sample processes altogether, the classification model that is produced by the training sample of the T time is CtFinally, combining T classification models and a composite decision tree into C*,The new weight value (i-1, 2 …, s; T-1, 2, …, T) assigned to sample i in the tth decision tree generation procedure,is composed ofA single factor of betatIs the weight value factor for adjustment.
the specific training process comprises the following steps:
determining weight values to be initialized: assuming that the number of classification models to be generated is T, let T be 1,
Defining single weight initial value to each initial sampleEstablishing C based on probability distributiont。
If epsilont>0.5, completing the sample training process, and enabling T to be T-1; if epsilontCompleting the sample training process, and enabling T to be T; if 0<εtAnd (6) continuing the step (sixthly) until the content is less than or equal to 0.5.
Sixthly, calculating betat=εt/(1-εt)。
Seventhly, assigning values to the weight values of the samples:
if T is equal to T, the whole training process is ended, otherwise, the procedure goes to step (II) when T is equal to T +1, and the next sample training is finished.
Specifically, the pair generationThe decision tree is pruned and cut leaves, namely screening indexes influencing bank customer loss, cutting off redundant nodes which do not contribute much to classification precision, and leaving indexes closely related to the bank customer loss, and the method comprises the following specific steps: decision tree C*The error sample rate of each node in the formula (5) is substituted to obtain an upper bound U of the confidence rangeCF(t)。
Where n (t) is the total number of samples of a certain node t, e (t) is the number of samples identified by the node t which are not included in the node t, r (t) is the error rate of the samples of the node t, and CF is the confidence range [ L [CF,UCF]The confidence space of (c).
Computing decision Tree C*The number of error samples E (t) of the middle t node is expressed as:
E(t)=n(t)×UCF(t) (6)
let E1Subtree C with root node ttThe sum of the number of error samples of all nodes; e2Is a pruned subtree CtReplacing the number of the obtained error samples by leaf nodes; e3Is subtree CtNumber of erroneous samples of the largest branch in the set. Traversing all nodes of the decision tree from bottom to top, pruning, and dividing into three conditions:
e1 min, no pruning; e2 min, pruning the subtree and replacing it with a leaf node; e3 minimum, grafting subtree CtTherefore, the maximum branch is replaced, and according to the rules, n (less than or equal to 11) indexes with the most influence are screened while the influence indexes of bank customer loss are classified.
In the step, the decision tree is calculated and iterated, and then the attributes with large weights are left to serve as influence indexes for judging whether the client is the lost client or not by the neural network in the next step.
And S104, training a preset neural network model according to the influence indexes and the trimmed decision tree to obtain the client loss early warning model.
Fig. 4 is a schematic diagram of a neural network algorithm training process provided in an embodiment of the present application, and referring to fig. 4, an output result of the decision tree is used as input data of a BP neural network, influence indexes screened by the decision tree are used as the number of input vectors of the BP neural network, the input vectors are denoted as p, S hidden layer neurons are provided, an activation function is S, 1 neuron is provided in an output layer, an output is a, the output is used for predicting a stable client and an early warning client, and an expected value is T.
The output expression of the ith neuron in the hidden layer is as follows:
the output expression of the kth neuron of the output layer is as follows:
calculating an error, wherein an error function expression is as follows:
wherein t iskIs a desired value.
Calculating the weight change of the output layer, wherein the expression is as follows:
wherein deltak=(tk-ak)S,ek=(tk-ak) Eta is the model learning rate, eta is a non-constant and is automatically adjusted in the BP neural network model, and the adjustment formula is expressed as:
wherein eta islowAnd ηhighRespectively representing the minimum value and the maximum value of the learning rate value; d is the attenuation.
Calculating the weight change of the hidden layer, wherein the expression is as follows:
Checking whether the total error of the BP neural network meets the precision requirement EminIf E isGeneral assembly<EminIf the bank customer loss early warning model is not used, the neural network training is ended, otherwise, the E is set to be 0, the output values of all layers are recalculated, and the bank customer loss early warning model training is ended so as to achieve the training purpose.
In the embodiment, a decision tree algorithm is used for screening a plurality of indexes influencing bank customer loss, and an optimization system of the indexes influencing the customer loss is constructed. And then learning and training the optimized influence index system by adopting a BP neural network, and finally classifying the clients to judge stable clients and output the clients. Compared with the traditional BP neural network algorithm, the decision tree hybrid BP neural network algorithm eliminates indexes with small influence, meanwhile, an index system of the bank client loss early warning model is established, analysts can observe the operation of the model in the whole process of decision tree calculation, the efficiency and the precision of the early warning model are improved, the decision-makers can stabilize clients conveniently, service schemes are adjusted for the early warning clients, and marketing strategies are improved according to data characteristics.
Fig. 5 is a schematic flow chart of a method for building a customer churn early-warning model according to a second embodiment of the present application, and as shown in fig. 5, first, a time span of customer churn is determined based on customer data analysis by combining uncertainty of a market and fluctuation of an economic cycle, and corresponding calculation and definition are performed on customer churn of a bank. Secondly, corresponding preparation work is carried out on customer data of the bank, the data is preprocessed after being imported, the data is cleaned, the data with the same attribute is integrated after being cleaned, the data is preliminarily classified, the data is selected as input data entering a model in the next step, then the data is classified by adopting a decision tree algorithm, and an index system influencing bank customer loss is screened out; and inputting an index system influencing bank customer loss into the neural network model for function calculation and error calculation, and outputting a judgment result when an error value meets the requirement. Further, the bank can conduct reason analysis and evaluation according to the early warning result so as to adjust the customer service scheme.
In this embodiment, multi-dimensional information of bank customers (mainly personal customers) is collected, data is preprocessed, customer early warning indexes are screened by using a decision tree model, an index system of a customer loss early warning model is built, the screened indexes are used as input layer data, a neural network model is built for learning and training, an error function is judged, neuron weights are adjusted repeatedly, errors are reduced until an error range is met, training is stopped, and finally stable customers and early warning customers are judged and output, so that decision makers can analyze and evaluate reasons conveniently, strategies are improved, and customer service schemes are adjusted. Compared with a decision tree algorithm, the prediction precision of the customer attrition model in the embodiment is higher; compared with a BP neural network algorithm, the stability of the client loss early warning model in the embodiment is stronger. In the embodiment, aiming at the defect that the single model is not ideal in prediction effect, the advantages of different models are utilized, the disadvantages are avoided, the bank customer loss early warning model is constructed, the advantage complementation is realized, and the accuracy and the robustness of the early warning model for customer behavior prediction are improved.
Fig. 6 is a schematic structural diagram of an apparatus for building a customer churn early-warning model according to a third embodiment of the present application, and as shown in fig. 6, the apparatus includes:
a training sample set obtaining module 61, configured to obtain a training sample set, where the training sample set includes a plurality of customer data;
a decision tree building module 62, configured to extract a plurality of attributes from the training sample set, and build a decision tree according to the plurality of attributes and the training sample set;
an influence index determining module 63, configured to prune the decision tree, and use the attribute in the pruned decision tree as an influence index of customer churn;
and a customer loss early warning model determining module 64, configured to train a preset neural network model according to the influence index and the trimmed decision tree, so as to obtain the customer loss early warning model.
The decision tree building module 62 includes:
the classification submodule is used for dividing the training sample set into a plurality of categories according to the attributes;
a gain ratio calculation sub-module for calculating gain ratios of each attribute with respect to the plurality of classes, respectively;
the splitting attribute calculation submodule is used for taking the attribute corresponding to the class with the highest gain ratio as the splitting attribute of each node in the decision tree to be established;
and the splitting submodule is used for splitting each node according to the splitting attribute to obtain the decision tree.
The split submodule includes:
and the child node establishing unit is used for establishing a plurality of child nodes corresponding to the attribute values for each node according to the attribute values corresponding to the split attributes to obtain the decision tree.
The influence index determination module 63 includes:
the iteration submodule is used for sequentially iterating the decision tree by adopting a plurality of training samples in the training sample set and a preset iteration algorithm;
and the pruning submodule is used for pruning the decision tree according to the sample classification error quantity of each child node in the decision tree after each iteration is finished.
The customer churn early warning model determining module 64 includes:
setting a submodule for using the number of the influence indexes as the dimension of the input data of the neural network model; setting neurons of an output layer of the neural network model to be 1;
and the training submodule is used for taking the data contained in the trimmed decision tree as the input data of the neural network model, training the neural network model and obtaining the client loss early warning model.
The customer churn early warning model determining module 64 further includes:
the error value calculation submodule is used for calculating the error value of the neural network model after each training;
and the judgment submodule is used for taking the neural network model as the client loss early warning model when the error value of the neural network model is smaller than a preset precision value.
Fig. 7 is a schematic structural diagram of a terminal device according to a fourth embodiment of the present application. As shown in fig. 7, the terminal device 7 of this embodiment includes: at least one processor 70 (only one shown in fig. 7), a memory 71, and a computer program 72 stored in the memory 71 and executable on the at least one processor 70, the processor 70 implementing the steps in any of the various method embodiments described above when executing the computer program 72.
The terminal device 7 may be a desktop computer, a notebook, a palm computer, a cloud server, or other computing devices. The terminal device may include, but is not limited to, a processor 70, a memory 71. Those skilled in the art will appreciate that fig. 7 is only an example of the terminal device 7, and does not constitute a limitation to the terminal device 7, and may include more or less components than those shown, or combine some components, or different components, for example, and may further include input/output devices, network access devices, and the like.
The processor 70 may be a Central Processing Unit (CPU), and the processor 70 may be other general purpose processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf programmable gate array (FPGA) or other programmable logic device, a discrete gate or transistor logic device, a discrete hardware component, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
The memory 71 may in some embodiments be an internal storage unit of the terminal device 7, such as a hard disk or a memory of the terminal device 7. In other embodiments, the memory 71 may also be an external storage device of the terminal device 7, such as a plug-in hard disk provided on the terminal device 7, a smart card (SMC), a Secure Digital (SD) card, a flash card (FlashCard), and so on. Further, the memory 71 may also include both an internal storage unit and an external storage device of the terminal device 7. The memory 71 is used for storing an operating system, an application program, a BootLoader (BootLoader), data, and other programs, such as program codes of the computer program. The memory 71 may also be used to temporarily store data that has been output or is to be output.
It should be noted that, for the information interaction, execution process, and other contents between the above-mentioned devices/units, the specific functions and technical effects thereof are based on the same concept as those of the embodiment of the method of the present application, and specific reference may be made to the part of the embodiment of the method, which is not described herein again.
It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-mentioned division of the functional units and modules is illustrated, and in practical applications, the above-mentioned function distribution may be performed by different functional units and modules according to needs, that is, the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-mentioned functions. Each functional unit and module in the embodiments may be integrated in one processing unit, or each unit may exist alone physically, or two or more units are integrated in one unit, and the integrated unit may be implemented in a form of hardware, or in a form of software functional unit. In addition, specific names of the functional units and modules are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present application. The specific working processes of the units and modules in the system may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
The embodiments of the present application further provide a computer-readable storage medium, where a computer program is stored, and when the computer program is executed by a processor, the computer program implements the steps in the above-mentioned method embodiments.
The embodiments of the present application provide a computer program product, which when running on a terminal device, enables the terminal device to implement the steps in the above method embodiments when executed.
The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, all or part of the processes in the methods of the embodiments described above can be implemented by a computer program, which can be stored in a computer-readable storage medium and can implement the steps of the embodiments of the methods described above when the computer program is executed by a processor. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer readable medium may include at least: any entity or device capable of carrying computer program code to a photographing apparatus/terminal apparatus, a recording medium, computer memory, Read-only memory (ROM), random-access memory (RAM), an electrical carrier signal, a telecommunications signal, and a software distribution medium. Such as a usb-disk, a removable hard disk, a magnetic or optical disk, etc. In certain jurisdictions, computer-readable media may not be an electrical carrier signal or a telecommunications signal in accordance with legislative and patent practice.
In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and reference may be made to the related descriptions of other embodiments for parts that are not described or illustrated in a certain embodiment.
Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
In the embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the modules or units is only one logical division, and there may be other divisions when actually implemented, for example, a plurality of units or components may be combined or may be integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
The above-mentioned embodiments are only used for illustrating the technical solutions of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not substantially depart from the spirit and scope of the embodiments of the present application and are intended to be included within the scope of the present application.
Claims (10)
1. A method for establishing a customer churn early warning model is characterized by comprising the following steps:
acquiring a training sample set, wherein the training sample set comprises a plurality of customer data;
extracting a plurality of attributes from the training sample set, and establishing a decision tree according to the plurality of attributes and the training sample set;
pruning the decision tree according to the weights of the attributes, and taking the attributes in the pruned decision tree as the influence indexes of the customer loss;
and training a preset neural network model according to the influence indexes and the trimmed decision tree to obtain the client loss early warning model.
2. The method of claim 1, wherein the decision tree includes a plurality of nodes, each node corresponding to a respective one of the attributes, and wherein building the decision tree based on the plurality of attributes and the set of training samples comprises:
according to the attributes, dividing the training sample set into a plurality of categories;
calculating gain ratios of each attribute with respect to the plurality of classes, respectively;
taking the attribute corresponding to the category with the highest gain ratio as the splitting attribute of each node in the decision tree to be established;
and splitting each node according to the splitting attribute to obtain the decision tree.
3. The method of claim 2, wherein the gain ratio for each attribute is calculated separately for the plurality of classes using the following formula:
wherein D represents an attribute, S represents a training sample set, and Gain (S, D) represents an information Gain of the attribute D; splitInfo(S,D)Information splitting items representing nodes; wherein,
Gain(S,D)=I(s1,s2,...,sm)-E(S,D)
where E (S, D) represents a weighted sum of the entropies of k subsets divided by attribute D, p (x)i) Represents the probability of each class occurrence ands represents the number of samples of a training sample set, each training sample comprising m different classes xi(i ═ 1,2,. m), each class xiNumber of samples of viThe attribute D has k different values { a }1,a2,...,akDividing the training sample set S into k different subsets S according to the value by the attribute DjDenotes the ith subset (j ═ 1, 2.., k), | SijI denotes siAttribute x in the subsetiThe number of samples.
4. The method of claim 3, wherein the splitting the nodes to obtain the decision tree according to the splitting attribute comprises:
and establishing a plurality of child nodes corresponding to the attribute values for each node according to the attribute values corresponding to the split attributes to obtain the decision tree.
5. The method of claim 2, wherein the pruning the decision tree comprises:
adopting a plurality of training samples in the training sample set and a preset iterative algorithm to sequentially iterate the decision tree;
and after each iteration is finished, pruning the decision tree according to the sample classification error quantity of each child node in the decision tree.
6. The method of claim 5, wherein training a pre-configured neural network algorithm based on the impact metric and the pruned decision tree comprises:
taking the number of the influence indexes as the dimensionality of input data of the neural network model; setting neurons of an output layer of the neural network model to be 1;
and taking the data contained in the trimmed decision tree as input data of the neural network model, and training the neural network model to obtain the client loss early warning model.
7. The method of claim 6, further comprising:
calculating an error value of the neural network model after each training;
and when the error value of the neural network model is smaller than a preset precision value, taking the neural network model as the client loss early warning model.
8. The utility model provides a device for establishing customer loss early warning model which characterized in that includes:
the training sample set acquisition module is used for acquiring a training sample set, and the training sample set comprises a plurality of client data;
the decision tree establishing module is used for extracting a plurality of attributes from the training sample set and establishing a decision tree according to the attributes and the training sample set;
an influence index determining module, configured to prune the decision tree according to the weights of the multiple attributes, and use the attributes in the pruned decision tree as influence indexes of customer churn;
and the client loss early warning model determining module is used for training a preset neural network model according to the influence indexes and the trimmed decision tree to obtain the client loss early warning model.
9. A terminal device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the method according to any of claims 1 to 7 when executing the computer program.
10. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the method according to any one of claims 1 to 7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010920139.3A CN113034264A (en) | 2020-09-04 | 2020-09-04 | Method and device for establishing customer loss early warning model, terminal equipment and medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010920139.3A CN113034264A (en) | 2020-09-04 | 2020-09-04 | Method and device for establishing customer loss early warning model, terminal equipment and medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN113034264A true CN113034264A (en) | 2021-06-25 |
Family
ID=73014311
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010920139.3A Pending CN113034264A (en) | 2020-09-04 | 2020-09-04 | Method and device for establishing customer loss early warning model, terminal equipment and medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113034264A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113570044A (en) * | 2021-07-30 | 2021-10-29 | 中国银行股份有限公司 | Customer loss analysis model training method and device |
CN114548523A (en) * | 2022-01-26 | 2022-05-27 | 深圳市傲天科技股份有限公司 | User viewing information prediction method, device, equipment and storage medium |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104506340A (en) * | 2014-11-21 | 2015-04-08 | 河南中烟工业有限责任公司 | Creation method of decision tree in industrial Ethernet fault diagnosis method |
US20150310336A1 (en) * | 2014-04-29 | 2015-10-29 | Wise Athena Inc. | Predicting customer churn in a telecommunications network environment |
CN109741826A (en) * | 2018-12-13 | 2019-05-10 | 华中科技大学鄂州工业技术研究院 | Anaesthetize evaluation decision tree constructing method and equipment |
WO2019153878A1 (en) * | 2018-02-06 | 2019-08-15 | 华为技术有限公司 | Data processing method based on machine learning, and related device |
-
2020
- 2020-09-04 CN CN202010920139.3A patent/CN113034264A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20150310336A1 (en) * | 2014-04-29 | 2015-10-29 | Wise Athena Inc. | Predicting customer churn in a telecommunications network environment |
CN104506340A (en) * | 2014-11-21 | 2015-04-08 | 河南中烟工业有限责任公司 | Creation method of decision tree in industrial Ethernet fault diagnosis method |
WO2019153878A1 (en) * | 2018-02-06 | 2019-08-15 | 华为技术有限公司 | Data processing method based on machine learning, and related device |
CN109741826A (en) * | 2018-12-13 | 2019-05-10 | 华中科技大学鄂州工业技术研究院 | Anaesthetize evaluation decision tree constructing method and equipment |
Non-Patent Citations (4)
Title |
---|
SPSS学堂: "C4.5算法剪枝2", HTTPS://ZHUANLAN.ZHIHU.COM/P/165647231, pages 1 - 4 * |
张朝阳: "深入浅出:工业机器学习算法详解与实战", 机械工业出版社, pages: 140 - 141 * |
金星路吴彦祖: "决策树 ——C4.5 算法(断断续续更新)", HTTPS://WWW.DOUBAN.COM/NOT E/691358224/?TYPE=REC&_I=2391116D_DDBOH, pages 1 - 10 * |
金星路吴彦祖: "决策树——C4.5算法(断断续续更新)", pages 1 - 10, Retrieved from the Internet <URL:https://www.douban.com/note/691358224/?type=rec&_i=2391116d_dDbOH> * |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113570044A (en) * | 2021-07-30 | 2021-10-29 | 中国银行股份有限公司 | Customer loss analysis model training method and device |
CN114548523A (en) * | 2022-01-26 | 2022-05-27 | 深圳市傲天科技股份有限公司 | User viewing information prediction method, device, equipment and storage medium |
CN114548523B (en) * | 2022-01-26 | 2023-11-07 | 深圳市傲天科技股份有限公司 | User viewing information prediction method, device, equipment and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108564286B (en) | Artificial intelligent financial wind-control credit assessment method and system based on big data credit investigation | |
CN108898479B (en) | Credit evaluation model construction method and device | |
AU2016245868B2 (en) | Automated model development process | |
CN110956273A (en) | Credit scoring method and system integrating multiple machine learning models | |
CN109739844B (en) | Data classification method based on attenuation weight | |
CN108898476A (en) | A kind of loan customer credit-graded approach and device | |
CN112559900B (en) | Product recommendation method and device, computer equipment and storage medium | |
CN112700324A (en) | User loan default prediction method based on combination of Catboost and restricted Boltzmann machine | |
CN113674013B (en) | Advertisement bidding adjustment method and system based on merchant custom rules | |
CN110147389A (en) | Account number treating method and apparatus, storage medium and electronic device | |
CN113034264A (en) | Method and device for establishing customer loss early warning model, terminal equipment and medium | |
CN111667307A (en) | Method and device for predicting financial product sales volume | |
CN118134652A (en) | Asset configuration scheme generation method and device, electronic equipment and medium | |
CN116362895A (en) | Financial product recommendation method, device and storage medium | |
CN111753992A (en) | Screening method and screening system | |
JPH1196132A (en) | Sorting prediction device and storage medium storing computer program | |
CN114170000A (en) | Credit card user risk category identification method, device, computer equipment and medium | |
CN113554099A (en) | Method and device for identifying abnormal commercial tenant | |
CN113421154A (en) | Credit risk assessment method and system based on control chart | |
CN112785443A (en) | Financial product pushing method and device based on client group | |
CN114757723B (en) | Data analysis model construction system and method for resource element trading platform | |
CN115953166B (en) | Customer information management method and system based on big data intelligent matching | |
CN112634022A (en) | Credit risk assessment method and system based on unbalanced data processing | |
CN118505342A (en) | Information pushing method and device | |
CN117994017A (en) | Method for constructing retail credit risk prediction model and online credit service Scoredelta model |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |