CN115115389A

CN115115389A - Express customer loss prediction method based on value subdivision and integrated prediction

Info

Publication number: CN115115389A
Application number: CN202210236263.7A
Authority: CN
Inventors: 孙哲; 曹艺译; 孙知信; 赵学健; 汪胡青; 宫婧; 胡冰
Original assignee: Nanjing University of Posts and Telecommunications
Current assignee: Nanjing University of Posts and Telecommunications
Priority date: 2022-03-11
Filing date: 2022-03-11
Publication date: 2022-09-27
Anticipated expiration: 2042-03-11
Also published as: CN115115389B

Abstract

The invention provides an express delivery customer loss prediction method based on value subdivision and integrated prediction, which comprises a customer value subdivision module, a loss prediction and early warning module and a personalized saving module, wherein the customer value subdivision module is used for customer value measurement and calculation and customer classification; the loss prediction and early warning module comprises a website customer loss prediction module and a single customer loss rate prediction module and is used for predicting whether customers lose and loss rate; the early warning and personalized saving module is used for providing personalized saving schemes for clients with different values according to the influence index system of client loss and the value importance degree of the clients. The invention can accurately classify the customers, can predict whether the customers lose or not, the loss probability and the loss amount of the customers at the network points with high precision, and provides personalized loss early warning according to the prediction result.

Description

Express customer loss prediction method based on value subdivision and integrated prediction

Technical Field

The invention relates to an express customer loss prediction method based on value subdivision and integrated prediction, and belongs to the technical field of logistics and machine learning.

Background

Because the express service industry in China starts relatively late, related service concepts and marketing management modes cannot well adapt to the development requirements of the market. When an operation strategy is formulated facing a client, different strategies can be more hopefully implemented aiming at different clients, and accurate operation is realized. The premise of the precise operation is customer relationship management, and the core of the customer relationship management is customer classification. By means of customer classification, customer groups can be subdivided, low-value customers and high-value customers can be distinguished, different personalized services can be provided for different customer groups, limited resources can be reasonably distributed to the customers with different values, and benefit maximization is achieved.

This patent relates to the application of the following algorithm:

the RFM model is an important tool and means to measure customer value and customer profitability. The mechanical model describes the value condition of a client through 3 indexes of the latest consumption time R, the consumption frequency F and the consumption amount M of the client.

The meta-heuristic algorithm is an algorithm inspired by biological behaviors and physical phenomena, and the core idea of the meta-heuristic algorithm is to realize the balance of random behaviors and local search in the search process. In solving a plurality of multi-modal, discrete and non-differential realistic optimization problems, the meta-heuristic algorithm presents excellent operability and optimization capability and is successfully applied to various scientific fields.

The inspiration of the chimpanzee optimization algorithm (ChOA) comes from an optimization algorithm derived from the hunting behavior of chimpanzees in nature, which take different actions to search for prey according to division. The method is derived from simulation of chimpanzee individual intelligence, sexual motivation and predation behavior in nature, and an effective optimization scheme is constructed through the processes of driving, chasing, attacking and the like. The standard ChOA algorithm classifies chimpanzee populations into four types: attackers, handicappers, repellers and chasers, wherein the attackers are the leaders of the population, other three classes of chimpanzees assist the hunting, and the social status declines sequentially.

The sine and cosine algorithm belongs to a novel nature-imitated optimization algorithm, and solves an optimization problem by creating a plurality of random candidate solutions and utilizing a sine and cosine mathematical model. The sine mechanism can enable global search to find an optimal solution, reduce optimization blind spots of the cosine mechanism, reduce individuals from falling into local optimization, enable local development to fill up the defect that the speed of convergence of the global search of the sine is full, improve exploration capacity and accelerate algorithm convergence. The mutual use of sine and cosine can well balance the exploration and development capability of the algorithm and promote the optimization of the performance of the algorithm together.

Gaussian variation is another variation operation method for improving the local search performance of the genetic algorithm on key search areas. When the mutation operation is performed, the original gene value is replaced by a random number conforming to a normal distribution with the mean value being the variance. From the characteristics of the normal distribution, it is known that the gaussian variation is also an important search for a local region near the original individual. Gaussian variation involves adding a random value to create a new offspring from the gaussian distribution for each element of the individual's vector.

The ensemble learning algorithm is a machine learning method in which a series of learners are used for learning, and learning results are integrated by using a certain rule, so that a better learning effect is obtained than that of a single learner. The ensemble learning can be used for classification problem integration, regression problem integration, feature selection integration, abnormal point detection integration and the like, and the figure of the ensemble learning can be seen in all machine learning fields.

Disclosure of Invention

The invention aims to provide an express customer loss prediction method based on value subdivision and integrated prediction, which is innovated and improved aiming at the problems of influencing customer loss factors, individual requirements of different customers, accurate operation of enterprises and the like.

The technical scheme of the invention is as follows: an express delivery customer loss prediction method based on value subdivision and integrated prediction comprises a customer value subdivision module, a loss prediction and early warning module and a personalized saving module,

the client value subdivision module is used for client value measurement and calculation and client classification, an LSRMT client value subdivision model is designed by adopting an improved RFM model, relevant indexes are introduced, initial grade division is carried out on index values, then the index weights are determined according to a dual-target constraint model of the index weights, and finally the final value scores of clients are calculated by summing the index value indexes to realize the classification of the clients;

the loss prediction and early warning module comprises a network point customer loss prediction module and a single customer loss rate prediction module, wherein the network point customer loss prediction module comprises the construction of an influence index system of customer loss, the improvement of a chimpanzee optimization algorithm and the prediction of loss by using an improved chimpanzee optimization algorithm and an XGboost fused customer loss prediction model; the single client attrition rate prediction module mainly comprises client information system construction and an integrated learning model prediction single client attrition rate, new characteristic attributes are generated based on original behavior data of clients, a client information system is constructed, multiple integrated learning models are used as a base prediction classifier, partial characteristics are selected as attribute feature subsets to train the base prediction classifier, then weights of sub-models are trained through a linear classifier, and finally whether the clients are attrited or not and the attrition rate prediction is made according to weighting results;

the early warning and personalized saving module is used for providing personalized saving schemes for clients with different values according to an influence index system of client loss and the value importance degree of the clients, and the model establishes a target constraint model of the influence index of queuing time and the client loss, so that more real and credible data support is provided for enterprises on the premise of realizing the minimum loss of the clients.

Further, the express delivery customer churn prediction method based on value subdivision and integrated prediction includes: the customer value segmentation module comprises the following steps:

step 1: defining the following client value indexes, namely client relationship duration L, client sending activity S, client receiving activity R, average client cost M and client trust T, and dividing the indexes into an initial grade x according to a sorted data set _j ，

Wherein j is 1,2,3,4, 5; a represents a lower threshold, b represents an upper threshold, and the specific value is selected reasonably according to the actual data set in a box;

step 2: measuring and calculating value information VI of the value index according to the selected customer sample data _ij Substituting data into value-based information VI _ij Uncertainty and index weight W _j And a sample weight component w _ij And (3) solving a preferred weight by using a double-target constraint optimization model with minimum consistency:

value information uncertainty objective function:

weight consistency objective function:

constraint (sum of index weights is 1):

wherein m is the selected customerNumber of samples of index, W _j Weight, w, of the j-th index _ij A weight component representing the ith sample, the jth index;

step 3: calculating the value index V of the current index of the client according to the initial value score and the weight assigned by the index _j And is used for expressing the score of the client at each index layer, and the calculation formula is as follows:

V _j ＝x _j ×W _j

value index V according to various indexes of client _j And summing to obtain a total value score V _sum ，

Step 4: scoring a total value of the customer V _sum The customers are ranked and ranked into value classes, and classified into core value customer user1, general value customer user2, and potential value customer user 3.

Further, the express delivery customer churn prediction method based on value subdivision and integrated prediction includes: the network customer loss prediction module comprises the following steps

(1) Constructing an index system influencing customer loss;

(2) constructing an improved chimpanzee optimization algorithm, and training a model to obtain related parameters;

(3) constructing and optimizing a BSGChOA _ XGboost model for training;

(4) and predicting the client loss under different indexes.

Further, the express delivery customer churn prediction method based on value subdivision and integrated prediction includes: the step (1) comprises the following steps that the index system influencing the customer loss is a 3-layer index system set comprising a target layer, a criterion layer and an index layer.

Further, the express delivery customer churn prediction method based on value subdivision and integrated prediction includes: the step (2) comprises the following steps,

step 1: initializing relevant parameter settings for optimizing a chimpanzee algorithm;

step 2: generating an initial population, performing improved Bernoulli chaotic mapping on the position of the initial population, introducing random variable factors to improve the uniform distribution of the initial population, generating a chaotic sequence in a [0, 1] interval through a chaotic mapping relation, and then converting the chaotic sequence into a search space of an individual to generate the initial population;

wherein i represents the current population scale, k represents the variable serial number of the chaotic mapping,

expressing the k mapping function value;

step 3: calculating the fitness of each individual in the chimpanzee population, selecting the first 5 individual positions with the optimal fitness and respectively recording the positions as X _attacker 、X _observer ，X _chaser ，X _barrier ，X _driver ；

Step 4: updating because of the convergence factor f (t) and coefficient vectors a and c,

a＝2×f(t)×R ₃ -f(t) (9)

c＝2×R ₅ (10)

wherein R is ₁ 、R ₃ 、R ₅ Is [0, 1]]Random factor of between, T _max Is the maximum iteration number, k is an adjusting factor, and k belongs to [1, 5]]；

Step 5: investigator

Position updating, other individuals need to judge whether to take hunting action according to the current position information of the investigator, if the current prey arresting success rate P _arrested Greater than the minimum arrest rate P _min Then, the attacker, the handicapped, the driver and the chaser immediately take hunting action, the inspector continuously searches the next hunter, the position information of the inspector and the other chimpanzee individuals is updated according to the mapped initialization population information and the position relationship between the inspector and the other chimpanzee individuals,

wherein, P _arrested Indicating the current arrest rate, P _min Represents the minimum arrest rate, R ₂ Represents [0, 1]]The SND is a random number which obeys standard normal distribution;

step 6: after the positions of the investigators are updated, the rest individuals enter a search iteration stage, and the random factor R is updated firstly ₄ Judging whether to enter global search or local search currently, if R is ₄ If the value is more than or equal to epsilon, entering a global search stage; in the global search stage, introducing a self-adaptive factor w which changes along with the iteration number, updating the positions of the chimpanzee individuals in the population according to a factor change curve,

w＝α(cosh(πt/T _max )+δ) (12)

X ₁ ＝w ₁ *{X _attacker -a ₁ |C ₁ X _attacker -m ₁ X|} (13)

X ₂ ＝w ₂ *{X _chaser -a ₂ |C ₂ X _chaser -m ₂ X|} (14)

X ₃ ＝w ₃ *{X _barrier -a ₃ |C ₃ X _barrier -m ₃ X|} (15)

X ₄ ＝w ₄ *{X _driver -a ₄ |C ₄ X _driver -m ₄ X|} (16)

X ₅ ＝w ₅ *{X _observer -a ₅ |C ₅ X _observer -m ₅ X|} (17)

step 7: if R is ₄ If the time is less than epsilon, entering a local search stage, introducing a judgment logic of iteration times in the local search stage to judge whether the current iteration times t is less than the specified iteration times t or not in order to prevent the local optimization possibly occurring in the early and late stages of the algorithm iteration _* If yes, updating the conversion parameter beta, and updating the position information of the attacker by using an improved sine and cosine algorithm; otherwise, the Gaussian variation is performed on the position of the attacker,

after introducing sine and cosine algorithm, the position updating formula of the attacker in the population is as follows:

wherein, X _attacker (t) denotes the position of the aggressor in the t-th iteration, p ₁ 、ρ ₂ 、ρ ₃ Is a random number, p ₁ ∈[0，2π]，ρ ₂ ∈[0，2]，ρ ₃ ∈[0，1]P (t) represents the position of the current optimal individual, β is a conversion parameter, and the calculation formula is:

wherein, beta _max ，β _min Respectively, the maximum and minimum values of the transformation parameter, T _max Is the maximum number of iterations in the sequence,

the mathematical model of gaussian variation for the attacker position is as follows:

X’ _attacker ＝X×[1+k×N(0，1)] (21)

wherein, X' _attacker Is the updated position vector, X is the position vector of the current individual chimpanzee, k is [0, 1]]N (0, 1) is a gaussian distributed random vector with a mean of 0 and a variance of 1;

step 8: calculating new fitness according to the obtained new solutionAnd optimal individual and position information are obtained, whether the current algorithm meets the iteration termination condition is checked, and if the maximum iteration time T is reached _max Then the optimal position is terminated and output, otherwise Step4 is returned and re-executed.

Further, the express delivery customer churn prediction method based on value subdivision and integrated prediction includes: when the XGBoost model is constructed in the step (3), the timing sequence survey data of the client loss under various influence indexes is respectively used as the main characteristic input, the corresponding influence indexes influencing loss are used as labels, and the optimal tree structure model is established by taking the values of the parameters of the chimpanzee optimization algorithm model to minimize the objective function, which specifically comprises the following steps:

step 1: in the BSGChOA _ XGboost model training stage, selecting a fitness function of a chimpanzee individual as the prediction accuracy of the model, firstly, randomly initializing the population number, and setting the initial value and the value range of each parameter of the model;

step 2: and selecting part of sample data in the criterion layer influencing the index set as a training set for model training and parameter optimization, and calculating a fitness function value of the model, wherein the function value represents an optimal solution obtained by each operation of the chimpanzee optimization algorithm. The residual samples are used as a test set to carry out final evaluation on the performance of the model, a training set is subjected to sampling prediction, then the prediction results are averaged to obtain a final prediction result,

step 3: verifying the trained model by using the test set, evaluating whether the value of each parameter reaches the current optimal value according to the fitness function value, and if so, replacing the original parameter; otherwise, continuing to keep the current parameters;

step 4: and (4) checking whether the algorithm meets an iteration termination condition, if the algorithm reaches the maximum iteration times, terminating and outputting the optimal values of all parameters in the iteration process, and if not, returning and executing Step2 again.

Further, the express customer churn prediction method based on value subdivision and integrated prediction comprises the following steps: the single customer attrition rate prediction module comprises the steps of:

step 1: generating various data characteristics of the client and constructing a client information system;

step 2: according to the information of the customer information system, a two-layer integrated learning algorithm is used for predicting the attrition rate of a single customer,

a first layer:

1) selecting n base classifiers and marking;

2) respectively selecting feature attribute sets F ₁ ＝{b ₁ ，b ₂ ，...，b ₅ }，F ₃ ＝{b ₈ ，b ₉ ，...，b ₁₇ Generating data sets D1 and D2, the first m base classifiers respectively use the data sets D1 and D2 as input sets for training the training sample set to respectively obtain m prediction results P ^k Selecting F ₄ ＝{b ₁₈ ，b ₁₉ ，...，b ₂₃ Taking the predicted results P as input data of the remaining n-m classifiers to obtain n-m predicted results P ^k ；

3) Aiming at the training results of the n individual models, algorithm integration is carried out, and n-dimensional feature vector P ═ P (P) is constructed ₁ ，P ₂ ，P ₃ ，...，P _n ) ^T ；

A second layer:

1) the output characteristic vector P is used as the input of a linear classifier, and the weight of each type of model in the integrated model is learned through a gradient descent method _j Obtaining the final prediction result P based on the n base prediction models ^* And then, by weighting the final probabilities of customer churn,

wherein, Churn _j (u, i) represents the loss prediction probability, weight, generated by the jth model _j Representing the weight assigned to the jth model.

Further, the express delivery customer churn prediction method based on value subdivision and integrated prediction includes: the data feature in Step1 is derived from the customer base feature F ₁ Order characteristics F ₂ Customer, client-order interaction feature F ₃ And order liveness feature F ₄ Is prepared by the method (1).

Further, the express delivery customer churn prediction method based on value subdivision and integrated prediction includes: the customer saving scheme design module comprises the following steps:

step 1: referring to the customer attrition rate and the customer attrition amount predicted by the prediction model, the customer with the highest attrition probability is revisited first, and the preferential activity of old user regression is provided;

step 2: for queuing time length L _s The influence index influencing the customer loss of the network point aims to ensure that the waiting time of the customer does not exceed the longest service time which can be borne by the customer and the service intensity of service personnel does not exceed the maximum service intensity, constructs a target constraint model of the customer loss and waiting time, the service intensity and the team length, provides more real and credible data support for enterprises on the premise of realizing the minimum customer loss,

the objective function is:

the constraint conditions are as follows:

where p is the average service strength,

ρ _max for the maximum service intensity that the service personnel can withstand,

W _s is the average waiting time of the customer,

W _s-max for the maximum residence time that the customer can endure,

L _s is the average team length

The invention combines the LSRMT model, the improved chimpanzee optimization algorithm and the ensemble learning algorithm, can accurately classify the clients, can predict whether the clients lose or not, the loss probability and the client loss amount of network points with high precision, and provides personalized loss early warning according to the prediction result.

Drawings

FIG. 1 is a block diagram of a method for predicting the loss of an express customer based on value breakdown and integrated prediction;

FIG. 2 is a flow diagram of a customer value segmentation module;

FIG. 3 is a flow chart of a modified chimpanzee optimization algorithm (BSGChOA) based on a hybrid strategy;

FIG. 4 is a flow diagram of a BSGChOA _ XGboost model prediction module;

FIG. 5 is a flow diagram of an integrated predictive model based on a customer information system;

FIG. 6 is a set of index systems affecting customer churn in the express industry;

FIG. 7 is a set of customer attribute features constructed based on feature engineering.

Detailed Description

In order to make the implementation purpose, technical scheme and advantages of the invention clearer, the technical scheme of the invention is clearly and completely described in the following steps with the combination of the attached drawings.

As shown in the attached drawing 1, the express customer loss prediction method based on value subdivision and integrated prediction provided by the invention comprises a customer value subdivision module and a loss prediction and early warning and personalized saving module, wherein the customer subdivision is realized by extracting customer value index data and measuring and calculating value scores, and then loss prediction and early warning are respectively carried out on each class of customers. The churn prediction module comprises churn rate prediction of a single client and churn amount prediction of network point clients under different influence indexes. And then according to the prediction result, the current value situation and the future loss situation of the customers with different values are considered from different influence index dimensions, and an early warning scheme of customer loss is provided for the enterprise. The customer saving scheme design module provides personalized saving schemes for customers with different values according to key influence indexes influencing the customer loss and the value importance degree of the customers. And a target constraint model influencing index queuing time and customer loss is designed in a key mode, the optimal queuing time is found, the enterprise can be guaranteed to obtain a certain service volume, the minimum loss cost of the customer loss is met, and more real and credible data support is provided for the enterprise.

As shown in fig. 2, the customer value segmentation module is mainly used for customer value estimation and customer classification. In order to better understand the customer value, the LSRMT customer value subdivision model is designed by improving the RFM model. And the client relation duration, the sending activity, the receiving activity, the average consumption amount and the client trust index are introduced, the index values are subjected to initial grade division, then the index weights are determined according to a dual-target constraint model of the index weights, and finally the final value scores of the clients are calculated through summing the index value indexes, so that the classification of the clients is realized.

As shown in fig. 3 and 4, the website customer loss prediction module includes the construction of an influence index system of customer loss, the improvement of a chimpanzee optimization algorithm, and the loss prediction using an improved chimpanzee optimization algorithm (bsgchoaa) and an XGBoost fused customer loss prediction model. Referring to the current state of development of the express industry, a 3-layer index system set of a target layer, a criterion layer and an index layer is constructed as shown in fig. 6. In the model training process, a statistical data set of the customer loss is introduced, parameters of a training model are optimized and selected by using an improved BSGChOA algorithm, and a BSGChOA _ XGboost model between the statistical data set of the customer loss and key factors influencing customer loss is established.

As shown in fig. 5, the single customer attrition rate prediction module mainly includes a customer information system construction and an integrated learning model to predict the single customer attrition rate, and as shown in fig. 7, the module generates a new feature attribute based on the original behavior data of the customer, and constructs a customer information system from a plurality of dimensions such as a customer basic feature, an order feature, a customer order interaction feature, an order activity degree, and the like. And adopting various integrated learning models as a base prediction classifier, selecting the customer basic characteristics and customer order interaction characteristics as an attribute characteristic subset to train the base prediction classifier, then training the weight of a submodel through a linear classifier, and finally making prediction on whether customers lose and the loss rate according to a weighting result.

The early warning and personalized saving module is used for providing personalized saving schemes for clients with different values according to an influence index system of client loss and the value importance degree of the clients, and designing a target constraint model of the influence index of queuing time and the client loss at a certain point, so that more real and credible data support is provided for enterprises on the premise of realizing the minimum loss of the clients.

The customer value segmentation module comprises the following steps:

step 1: selecting client data information corresponding to a client value index (LSRMT) from a client data set of past operation history of an enterprise, wherein the client relation duration L selects a time interval from first order placement to current ordering of a client, and the unit is day; selecting the number of successful sending pieces in the last 1 month of the client according to the client sending piece liveness S; selecting the successful receiving number of the client in the last 1 month by the client receiving activity R; the average expense amount M of the client selects the total expense amount of the client in the last month to be divided by the total number of the mails; the client trust degree T selects the total times of the clients for canceling the mails and the addressees in the midway to express the degree of the dependence trust of the clients on the enterprises. Then, the indexes are subjected to initial grade division x according to the sorted data set _j 。

Wherein j is 1,2,3,4, 5; a represents a lower threshold, b represents an upper threshold, and the specific value is selected reasonably according to the actual data set.

Step 2: measuring and calculating value information VI of the value index according to the selected customer sample data _ij Because the index weight will depend on the current value information VI of the index _ij The size is distributed to better describe the index value VI in the sample data _ij Uncertainty of (d) and currently selected sample weight component w _ij And an index weight W _j More accurately calculates the weighted value of the client index, and substitutes the data into the value-based information VI _ij Uncertainty and index weight W _j And sample weight scoreQuantity w _ij And (4) solving a better weight by the double-target constraint optimization model with the minimum consistency.

Value information uncertainty objective function:

weight consistency objective function:

constraint (sum of index weights is 1):

wherein m is the number of samples of the selected customer index, W _j Weight, w, of the j-th index _ij The weight component representing the jth index of the ith sample.

V _j ＝x _j ×W _j

value index V according to various indexes of client _j Summing to obtain a total value score V _sum The larger the total score is, the larger the customer value is considered to be. Thereby obtaining the total value score V of the current client _sum 。

Step 4: according toCustomer Total value score V _sum The customers are ranked and ranked into value classes, and classified into core value customer user1, general value customer user2, and potential value customer user 3.

After the classification of the customer values is realized, the customer needs to be subjected to loss prediction. In order to improve the prediction accuracy of the prediction algorithm and avoid the algorithm from falling into local optimum in the search stage, the scheme optimizes the parameters of the training model by using an improved chimpanzee optimization algorithm (BSGChOA) based on a hybrid strategy. The loss prediction module for the network point customer comprises the following steps:

step 1: initializing relevant parameter setting of optimized chimpanzee algorithm, setting the population scale N of the chimpanzee to be 30, and setting the maximum iteration number T of the algorithm _max 500, maximum value β of conversion parameter _max 10, minimum value beta _min 1, minimum arrest rate P _min 0.7, the number of iterations t is specified _* 200 and a switching coefficient e 0.4.

Step 2: and generating an initial population. Performing improved Bernoulli chaotic mapping on the position of the initial population, introducing random variable factors to improve the uniform distribution of the initial population, generating a chaotic sequence in a [0, 1] interval through a chaotic mapping relation, and then converting the chaotic sequence into an individual search space to generate the initial population.

representing the function value of the k-th mapping.

Step 3: calculating the fitness of each individual in the chimpanzee population, selecting the first 5 individual positions with the optimal fitness and respectively recording the positions as X _attacker 、X _obswrver ，X _chaser ，X _barriwr ，X _drivwr 。

Step 4: and in order to balance the exploration and development capacity of the algorithm, a nonlinear convergence factor f (t) which dynamically changes along with the number of iterations is introduced and then substituted into a formula to successively update the coefficient vectors a and c in the chimpanzee search stage.

a＝2×f(t)×R ₃ -f(t) (9)

c＝2×R ₅ (10)

Wherein R is ₁ 、R ₃ 、R ₅ Is [0, 1]]Random factor of between, T _max Is the maximum iteration number, k is an adjusting factor, and k belongs to [1, 5]]。

Step 5: investigator

And (4) updating the position. Other individuals need to determine whether to take hunting actions according to the current position information of the investigator. If the current prey arrest success rate P _arrested Greater than the minimum arrest rate P _min Then, the attacker, the obstacle, the driver and the chaser immediately take hunting action, the inspector continuously searches the next hunter and updates the position information according to the mapped initialization population information and the position relationship between the inspector and other chimpanzee individuals.

Wherein, P _arrested Indicating the current arrest rate, P _min Represents the minimum arrest rate, R ₂ Represents [0, 1]]And SND is a random number that follows a standard normal distribution.

Step 6: after the positions of the investigators are updated, the rest individuals enter a search iteration stage, and the random factor R is updated firstly ₄ Judging whether to enter global search or local search currently, if so, judging whether to enter global search or local searchR ₄ If the value is more than or equal to epsilon, entering a global search stage. In the global search stage, an adaptive factor w which changes along with the iteration number is introduced, and the positions of the individual chimpanzees in the population are updated according to a factor change curve.

w＝α(cosh(πt/T _max )+δ) (12)

X ₁ ＝w ₁ *{X _attacker -a ₁ |C ₁ X _attacker -m ₁ X|} (13)

X ₂ ＝w ₂ *{X _chaser -a ₂ |C ₂ X _chaser -m ₂ X|} (14)

X ₃ ＝w ₃ *{X _barrier -a ₃ |C ₃ X _barrier -m ₃ X|} (15)

X ₄ ＝w ₄ *{X _driver -a ₄ |C ₄ X _driver -m ₄ X|} (16)

X ₅ ＝w ₅ *{X _observer -a ₅ |C ₅ X _observer -m ₅ X|} (17)

Step 7: if R is ₄ If epsilon, entering a local searching stage. In the local search stage, in order to prevent local optimization which may occur before and after the iteration of the algorithm, a judgment logic of the iteration times is introduced, and whether the current iteration times t is less than the specified iteration times t or not is judged _* If yes, updating the conversion parameter beta, and updating the position information of the attacker by using an improved sine and cosine algorithm; otherwise, Gaussian mutation is carried out on the position of the attacker. The sine and cosine mechanism can make up the defect of low convergence speed of global search in a local search stage, improve the search capability and accelerate the convergence of the algorithm. The Gaussian variation can ensure that the curve shows larger fluctuation at the later stage of iteration and the local optimal limitation is quickly broken out. At this stage, the remaining individual chimpanzees still change position according to the original position update formula.

wherein, X _attacker (t) denotes the position of the aggressor in the t-th iteration, ρ ₁ 、ρ ₂ 、ρ ₃ Is a random number, ρ ₁ ∈[0，2π]，ρ ₂ ∈[0，2]，ρ ₃ ∈[0，1]And p (t) represents the position of the currently optimal individual. Beta is a conversion parameter. The calculation formula is as follows:

wherein, beta _max ，β _min Respectively, a maximum and a minimum of the transformation parameter, T _max Is the maximum number of iterations.

X’ _attacker ＝X×[1+k×N(0，1)] (21)

wherein, X' _attacker Is the updated position vector, X is the position vector of the current individual chimpanzee, k is [0, 1]]With decreasing variables, N (0, 1) is a gaussian distributed random vector with a mean of 0 and a variance of 1.

Step 8: calculating new fitness, optimal individual and position information according to the obtained new solution, checking whether the current algorithm meets the iteration termination condition, and if the maximum iteration time T is reached _max Then it terminates and outputs the optimal position, otherwise Step4 is returned and re-executed.

Step 9: and constructing a BSGChOA _ XGboost model for training, wherein when constructing the XGboost model, time sequence survey data of customer loss under various influence indexes are respectively used as main characteristic input, the corresponding influence indexes influencing loss are used as labels, and the optimal tree structure model is established by taking values of all parameters of the BSGChOA optimization model to minimize an objective function. Comprises the following steps

Step9.1: in the BSGChOA _ XGboost model training stage, selecting a fitness function of a chimpanzee individual as the prediction accuracy of the model, firstly, randomly initializing the population number, and setting the initial value and the value range of each parameter of the model; the learning rate learning _ rate default value is initially 0.30, the value range is 0.05-0.30, the minimum loss function reduction value gamma default value required by node splitting is 0, the value range is 0-0.20, the maximum depth max _ depth default value of the tree is 6, the value range is 4-10, the weight of the minimum leaf node sample and the min _ child _ weight default value are 1, the value range is 1-10, the weight lambda default value of the L2 regularization term is 1, and the value range is 0.1-10.

Step9.2: and selecting 80% of sample data in the criterion layer B influencing the index set for model training and parameter optimization, and calculating a fitness function value of the model, wherein the fitness function value represents an optimal solution obtained by running the chimpanzee optimization algorithm each time. And finally evaluating the performance of the model by taking the remaining 20 percent as a test set, performing sampling prediction on the training set, and averaging the prediction results to obtain a final prediction result.

Step9.3: evaluating whether the value of each parameter reaches the current optimal value or not according to the fitness function value by using a test set verification model, and if so, replacing the original parameter; otherwise, the current parameters are continuously kept.

Step9.4: and (4) checking whether the algorithm meets an iteration termination condition, if the algorithm reaches the maximum iteration times, terminating and outputting the optimal values of all parameters in the iteration process, and if not, returning and re-executing Step9.2.

The prediction module for a single customer attrition rate comprises the following steps:

step 1: from customer base features F ₁ Order characteristics F ₂ Customer-order interaction feature F ₃ And order liveness feature F ₄ And generating various data characteristics of the client in the behavior characteristics to construct a client information system.

Step 2: and predicting the loss rate condition of a single client by using a two-layer integrated learning algorithm according to the information of the client information system.

A first layer:

1) and selecting n base classifiers and marking. Here, n is generally in the range of [3-5 ]. Too small n results in insufficient prediction accuracy, and too large n results in higher algorithm complexity and cost.

2) Respectively selecting feature attribute sets F ₁ ＝{b ₁ ，b ₂ ，...，b ₅ }，F ₃ ＝{b ₈ ，b ₉ ，...，b ₁₇ Generate data sets D1 and D2. The first m base classifiers respectively use the data sets D1 and D2 as input sets for training the training sample set to respectively obtain m prediction results P ^k . Because F ₄ Most indirectly reflect the likelihood of customer churn, so option F ₄ ＝{b ₁₈ ，b ₁₉ ，...，b ₂₃ Taking the predicted results P as input data of the remaining n-m classifiers to obtain n-m predicted results P ^k 。

3) Aiming at the training results of the n individual models, algorithm integration is carried out, and n-dimensional feature vector P ═ P (P) is constructed ₁ ，P ₂ ，P ₃ ，...，P _n ) ^T 。

A second layer:

1) the output characteristic vector P is used as the input of a linear classifier, and the weight of each type of model in the integrated model is learned through a gradient descent method _j The final prediction result based on the n base prediction models is P ^* . The final probability of customer churn is then predicted by weighting.

The customer saving scheme design module comprises the following steps:

step 1: based on the above analysis, the operation steps of customer value measurement and value classification have been implemented, and customer churn rate and customer churn amount at the network site are predicted based on the historical data of customer consumption. In order to provide early warning and saving schemes for different value customer loss for enterprises, customer loss rate and customer loss amount predicted by referring to the prediction model are referred, customers with the highest loss probability are visited again first, and preferential activities of old user regression are provided.

Step 2: for queuing time length L _s The influence index influencing the customer loss of the network points aims to ensure that the waiting time of customers does not exceed the longest service time which can be born by the customers and the service intensity of service personnel does not exceed the maximum service intensity, constructs a target constraint model of the customer loss, the waiting time, the service intensity and the team length, provides more real and credible data support for enterprises on the premise of realizing the minimum loss of the customers,

an objective function:

constraint conditions are as follows:

where p is the average service strength,

W _s is the average waiting time of the customer,

W _s-max for the maximum residence time that the customer can tolerate,

L _s is the average team length.

Of course, various modifications and alterations of this invention may be made by those skilled in the art without departing from the spirit and scope of this invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.

Claims

1. An express customer loss prediction method based on value subdivision and integrated prediction is characterized by comprising the following steps: comprises a customer value subdivision module, a loss prediction and early warning module and a personalized saving module,

the early warning and personalized saving module is used for providing personalized saving schemes for different value customers according to an influence index system of customer loss and the value importance degree of the customers, and the model establishes a target constraint model of the influence index of queuing time and the customer loss, so that more real and credible data support is provided for an enterprise on the premise of realizing minimum loss of the customers.

2. The express customer churn prediction method based on value breakdown and integrated prediction according to claim 1, wherein: the customer value segmentation module comprises the following steps:

step 1: defining the following customer value indexes, the customer relation duration L, the customer sending activity S, the customer receiving activity R, the average customer spending amount M and the customer trust level T, and performing initial grade division on the indexes according to the sorted data set

，

（1）

Wherein j =1,2,3,4, 5; a represents a lower threshold, b represents an upper threshold, and the specific value is selected reasonably according to the actual data set in a box;

step 2: measuring and calculating value information of value index according to selected customer sample data

Substituting data based on value information

Uncertainty and index weight

And sample weight component

And (3) solving a preferred weight by using a double-target constraint optimization model with minimum consistency:

(2)

value information uncertainty objective function:

（3）

weight consistency objective function:

（4）

constraint conditions are as follows:

（5）

wherein m is the number of samples of the selected customer index,

the weight of the jth index is represented,

a weight component representing the ith sample, the jth index;

step 3: calculating the value index of the current index of the client according to the initial value score and the weight assigned by the index

And is used for expressing the score of the client at each index layer, and the calculation formula is as follows:

value index according to each index of customer

Summing to obtain a total value score

，

（6）

Step 4: scoring by customer total value

And sorting the customers and grading the value.

3. The express customer churn prediction method based on value breakdown and integrated prediction of claim 1, wherein the express customer churn prediction method comprises the following steps: the network customer loss prediction module comprises the following steps of (1) constructing an index system influencing customer loss;

(3) constructing and optimizing a BSGChOA _ XGboost model for training;

(4) and predicting the client loss under different indexes.

4. The express customer churn prediction method based on value breakdown and integrated prediction as claimed in claim 3, wherein the step (1) comprises the step that the index system influencing the customer churn is a 3-layer index system set comprising a target layer, a criterion layer and an index layer.

5. The express customer churn prediction method based on value breakdown and integrated prediction as claimed in claim 3, wherein the step (2) comprises the following steps,

（7）

expressing the k mapping function value;

step 3: calculating the fitness of each individual in the chimpanzee population, selecting the first 5 individual positions with the optimal fitness and recording the positions as

、

，

；

Step 4: update the convergence factor

And the coefficient vectors a and c are combined,

（8）

（9）

（10）

wherein,

the maximum number of iterations, k is the adjustment factor,

；

step 5: investigator

Updating the position, judging whether to take hunting action or not by other individuals according to the current position information of the investigator, and if the current prey arresting success rate is high

Greater than minimum arrest rate

Then, the attacker, the obstacle, the driver and the chaser immediately take hunting action, the inspector continuously searches the next hunter, and updates the position information according to the mapped initialization population information and the position relationship between the inspector and other chimpanzee individuals, and the formula is as follows:

（11）

wherein,

the current rate of arrest is shown as,

the minimum arrest rate is expressed as the minimum arrest rate,

represents [0, 1]]The SND is a random number which obeys standard normal distribution;

step 6: after the positions of the investigators are updated, the rest individuals enter a search iteration stage, and random factors are updated firstly

Judging whether to enter global search or local search currently, if so, judging whether to enter global search or local search

Entering a global search stage; in the global search stage, an adaptive factor which is changed along with the iteration number is introduced

And updating the positions of the chimpanzee individuals in the population according to the factor change curve, wherein the formula is as follows:

（12）

（13）

（14）

（15）

（16）

（17）

（18）

step 7: if it is

Entering a local search stage, introducing a judgment logic of iteration times in the local search stage to judge whether the current iteration times t is less than the specified iteration times in order to prevent the local optimization possibly occurring in the early and late stages of the algorithm iteration

If yes, updating conversion parameters

Updating the position information of the attacker by using an improved sine and cosine algorithm; otherwise, the Gaussian variation is carried out on the position of the attacker,

after the sine and cosine algorithm is introduced, the position updating formula of the attacker in the population is as follows:

（19）

wherein,

indicating the position of the attacker in the t-th iteration,

is a random number，

，

，

，

Indicating the location of the currently optimal individual,

for converting the parameters, the calculation formula is:

（20）

wherein,

，

respectively a maximum value and a minimum value of the conversion parameter,

in order to be the maximum number of iterations,

（21）

wherein,

is a current individual chimpanzeeK is [0, 1]]The variable is decreased in the middle, and the variable is decreased,

is a gaussian distributed random vector with a mean value of 0 and a variance of 1;

step 8: calculating new fitness, optimal individual and position information according to the obtained new solution, checking whether the current algorithm meets the iteration termination condition, and if the maximum iteration times is reached

Then the optimal position is terminated and output, otherwise Step4 is returned and re-executed.

6. The express customer churn prediction method based on value breakdown and integrated prediction according to claim 4, wherein: when the XGBoost model is constructed in the step (3), the timing sequence survey data of the client loss under various influence indexes is respectively used as the main characteristic input, the corresponding influence indexes influencing loss are used as labels, and the values of various parameters are optimized through a chimpanzee optimization algorithm model to minimize a target function, so as to establish an optimal tree structure model, which specifically comprises the following steps:

step 1: in the training stage of the BSGChOA _ XGboost model, initial parameters required by the prediction model are obtained through the optimization of a chimpanzee optimization algorithm,

step 2: selecting part of sample data in a criterion layer influencing an index set as a training set for model training and parameter optimization, calculating a fitness function value of the model, wherein the function value represents an optimal solution obtained by each operation of a chimpanzee optimization algorithm, the rest samples are used as a test set for carrying out final evaluation on the performance of the model, carrying out sampling prediction on the training set, then averaging the prediction results to obtain a final prediction result,

step 3: verifying the trained model by using the test set, evaluating whether the value of each parameter reaches the current optimal value according to the fitness function value of the prediction algorithm, and if so, replacing the original parameter; otherwise, continuing to keep the current parameters;

step 4: and (4) checking whether the algorithm meets an iteration termination condition, if the iteration termination condition reaches the maximum iteration times, terminating and outputting the optimal values of all parameters in the iteration process, and otherwise, returning and re-executing Step 2.

7. The express customer churn prediction method based on value breakdown and integrated prediction according to claim 1, wherein the single customer churn rate prediction module comprises the following steps:

a first layer:

1) selecting n base classifiers and marking;

2) selecting feature attribute sets separately

，

Generating data sets D1 and D2, wherein the first m base classifiers respectively use the data sets D1 and D2 as input sets for training the training sample set to respectively obtain m prediction results

，

Deriving n-m predictions as input data for the remaining n-m classifiers

；

3) Aiming at the training results of the n individual models, algorithm integration is carried out to construct n-dimensional feature vectors

；

A second layer:

1) the output feature vector P is used as the input of a linear classifier, and the weight of each type of model in the integrated model is learned by a gradient descent method

The final prediction result based on the above n base prediction models is obtained as

And then, by weighting the final probabilities of customer churn,

（22）

wherein,

representing the runoff prediction probability generated by the jth model,

representing the weight assigned to the jth model.

8. The express customer churn prediction method based on value breakdown and integrated prediction as claimed in claim 7, wherein the data characteristics in Step1 are derived from customer base characteristics

Order characteristics

Customer-order interaction feature

And order liveness characteristics

Is prepared by the method (1).

9. The express delivery customer churn prediction method based on value breakdown and integrated prediction of claim 1, wherein the customer saving scheme design module comprises the following steps:

step 1: referring to the client loss rate and the client loss amount predicted by the prediction model, revisiting the client with the highest loss probability, and providing preferential activities of old user regression;

step 2: for queuing length

The influence index influencing the customer churn of the network points constructs a target constraint model of the customer churn and waiting time, the service intensity and the team length,

the objective function is:

(23)

the constraint conditions are as follows:

（24）

wherein,

in order to be the average intensity of service,

for the maximum service intensity that the service personnel can withstand,

is the average waiting time of the customer,

for the maximum residence time that the customer can endure,

is the average team length.